Search Results: "hug"

26 August 2023

Andrew Cater: 20230826 OMGWTFBBQ - Cambridge is waking up

The meat has been fetched: those of us in the house are about to get bacon sandwiches. Pepper the dog is in the garden. Time for the mayhem to start, I think.
Various folk are travelling here so it will soon be crowded: the weather is sunny but cool and it looks good for a three day weekend.

This is a huge effort that falls to Steve and Jo and a huge disruption for them each year - for which many thanks, as ever. [And, as is traditional on this blog, the posts only ever seem to appear from Cambridge].

16 August 2023

Sam Hartman: A First Exercise with AI Training

Taking a hands-on low-level approach to learning AI has been incredibly rewarding. I wanted to create an achievable task that would motivate me to learn the tools and get practical experience training and using large language models. Just at the point when I was starting to spin up GPU instances, Llama2 was released to the public. So I elected to start with that model. As I mentioned, I m interested in exploring how sex-positive AI can help human connection in positive ways. For that reason, I suspected that Llama2 might not produce good results without training: some of Meta s safety goals run counter to what I m trying to explore. I suspected that there might be more attention paid to safety in the chat variants of Llama2 rather than the text generation variants, and working against that might be challenging for a first project, so I started with Llama-2-13b as a base. Preparing a Dataset I elected to generate a fine tuning dataset using fiction. Long term, that might not be a good fit. But I ve always wanted to understand how an LLM s tone is adjusted how you get an LLM to speak in a different voice. So much of fine tuning focuses on examples where a given prompt produces a particular result. I wanted to understand how to bring in data that wasn t structured as prompts. The Huggingface course actually gives an example of how to adjust a model set up for masked language modeling trained on wikitext to be better at predicting the vocabulary of movie reviews. There though, doing sample breaks in the dataset at movie review boundaries makes sense. There s another example of training an LLM from scratch based on a corpus of python code. Between these two examples, I figured out what I needed. It was relatively simple in retrospect: tokenize the whole mess, and treat everything as output. That is, compute loss on all the tokens. Long term, using fiction as a way to adjust how the model responds is likely to be the wrong starting point. However, it maximized focus on aspects of training I did not understand and allowed me to satisfy my curiosity. Rangling the Model I decided to actually try and add additional training to the model directly rather than building an adapter and fine tuning a small number of parameters. Partially this was because I had enough on my mind without understanding how LoRA adapters work. Partially, I wanted to gain an appreciation for the infrastructure complexity of AI training. I have enough of a cloud background that I ought to be able to work on distributed training. (As it turned out, using BitsAndBytes 8-bit optimizer, I was just able to fit my task onto a single GPU). I wasn t even sure that I could make a measurable difference in Llama-2-13b running 890,000 training tokens through a couple of training epochs. As it turned out I had nothing to fear on that front. Getting everything to work was more tricky than I expected. I didn t have an appreciation for exactly how memory intensive training was. The Transformers documentation points out that with typical parameters for mixed-precision training, it takes 18 bytes per model parameter. Using bfloat16 training and an 8-bit optimizer was enough to get things to fit. Of course then I got to play with convergence. My initial optimizer parameters caused the model to diverge, and before I knew it, my model had turned to NAN, and would only output newlines. Oops. But looking back over the logs, watching what happened to the loss, and looking at the math in the optimizer to understand how I ended up getting something that rounded to a divide by zero gave me a much better intuition for what was going on. The results. This time around I didn t do anything in the way of quantitative analysis of what I achieved. Empirically I definitely changed the tone of the model. The base Llama-2 model tends to steer away from sexual situations. It s relatively easy to get it to talk about affection and sometimes attraction. Unsurprisingly, given the design constraints, it takes a bit to get it to wonder into sexual situations. But if you hit it hard enough with your prompt, it will go there, and the results are depressing. At least for prompts I used, it tended to view sex fairly negatively. It tended to be less coherent than with other prompts. One inference managed to pop out in the middle of some text that wasn t hanging together well, Chapter 7 - Rape. With my training, I did manage to achieve my goal of getting the model to use more positive language and emotional signaling when talking about sexual situations. More importantly, I gained a practical understanding of many ways training can go wrong.

There were overfitting problems: names of characters from my dataset got more attention than I wished they did. As a model for interacting with some of the universes I used as input, that was kind of cool, but if I was looking to just adjust how the model talked about intimate situations, I massively got things to be too specific.
I gained a new appreciation for how easy it is to trigger catastrophic forgetting.
I begin to appreciate how this sort of unsupervised training could be best paired with supervised training to help correct model confusion. Playing with the model, I often ran into cases where my reaction was like Well, I don t want to train it to give that response, but if it ever does wander into this part of the state space, I d like to at least get it to respond more naturally. And I think I understand how to approach that either with custom loss functions or manipulating which tokens compute loss and which ones do not.
And of course realized I need to learn a lot about sanitizing and preparing datasets.

A lot of articles I ve been reading about training make more sense. I have better intuition for why you might want to do training a certain way, or why mechanisms for countering some problem will be important. Future Activities:

Look into LoRA adapters; having understood what happens when you manipulate the model directly, I can now move on to intelligent solutions.
Look into various mechanisms for rewards and supervised training.
See how hard it is to train a chat based model out of some of its safety constraints.
Construct datasets; possibly looking at sources like relationship questions/advice.

comments

6 August 2023

Sam Hartman: AI Tools

I wrote about how I m exploring the role of AI in human connection and intimacy. The first part of that journey has been all about learning the software and tools for approaching large language models. The biggest thing I wish I had known going in was not to focus on the traditional cloud providers. I was struggling until I found runpod.io. I kind of assumed that if you were willing to pay for it and had the money, you could go to Amazon on or google or whatever and get the compute resources you needed. Not so much. Google completely rejected my request to have the maximum number of GPUs I could run raised above a limit of 0. Go talk to your sales representative. And of course no sales representative was willing to waste their time on me. But I did eventually find some of the smaller AI-specific clouds. I intentionally wanted to run software myself. Everyone has various fine-tuning and training APIs as well as APIs for inference. I thought I d gain a much better understanding if I wrote my own code. That definitely ended up being true. I started by understanding PyTorch and the role of optimizers, gradient descent and what a model is. Then I focused on Transformers and that ecosystem, including Accelerate, tokenizers, generation and training. I m really impressed with the Hugging Face ecosystem. A lot of academic software is very purpose built and is hard to reuse and customize. But the hub strikes an amazing balance between providing abstractions for common interfaces like consuming a model or datasets without getting in the way of hacking on models or evolving the models. I had a great time, and after a number of false starts, succeeded in customizing Llama2 to explore some of the questions on my mind. I ll talk about what I accomplished and learned in the next post.

comments

4 August 2023

Shirish Agarwal: License Raj 2.0, 2023

About a week back Jio launched a laptop called JioBook that will be manufactured in China

The most interesting thing is that the whole thing will be produced in Hunan, China. Then 3 days later India mandates a licensing requirement for Apple, Dell and other laptop/tablet manufacturers. And all of these in the guise of Make in India . It is similar how India has exempted Adani and the Tatas from buying as much solar cells as are needed and then sell the same in India. Reliance will be basically monopolizing the laptop business. And if people think that projects like Raspberry Pi, Arduino etc. will be exempted they have another think coming.

History of License Raj After India became free, in the 1980s the Congress wanted to open its markets to the world just like China did. But at that time, the BJP, though small via Jan Sangh made the argument that we are not ready for the world. The indian businessman needs a bit more time. And hence a compromise was made. The compromise was simple. Indian Industry and people who wanted to get anything from the west, needed a license. This was very much in line how the Russian economy was evolving. All the three nations, India, China and Russia were on similar paths. China broke away where it opened up limited markets for competition and gave state support to its firms. Russia and Japan on the other hand, kept their markets relatively closed. The same thing happened in India, what happened in Russia and elsewhere. The businessman got what he wanted, he just corrupted the system. Reliance, the conglomerate today abused the same system as much as it could. Its defence was to be seen as the small guy. I wouldn t go into that as that itself would be a big story in itself. Whatever was sold in India was sold with huge commissions and just like Russia scarcity became the order of the day. Monopolies flourished and competition was nowhere. These remained till 1991 when Prime Minister Mr. Manmohan Singh was forced to liberalize and open up the markets. Even at that time, the RSS through its Swadeshi Jagran Manch was sharing the end of the world prophecies for the Indian businessman.

2014 Current Regime In 2010, in U.K. the Conservative party came in power under the leadership of David Cameron who was influenced by the policies of Margaret Thatcher who arguably ditched manufacturing in the UK. David Cameron and his party did the same 2010 onwards but for public services under the name austerity. India has been doing the same. The inequality has gone up while people s purchasing power has gone drastically down. CMIE figures are much more drastic and education is a joke.
Add to that since 2016 funding for scientists have gone to the dogs and now they are even playing with doctor s careers. I do not have to remind people that a woman scientist took almost a quarter century to find a drug delivery system that others said was impossible. And she did it using public finance. Science is hard. I have already shared in a previous blog post how it took the Chinese 20 years to reach where they are and somehow we think we will be able to both China and Japan. Of the 160 odd countries that are on planet earth, only a handful of countries have both the means and the knowledge to use and expand on that. While I was not part of Taiwan Debconf, later I came to know that even Taiwan in many ways is similar to Japan in the sense that a majority of its population is stuck in low-paid jobs (apart from those employed in TSMC) which is similar to Keiretsu or Chabeol from either Japan or South Korea. In all these cases, only a small percentage of the economy is going forward while the rest is stagnating or even going backwards. Similar is the case in India as well Unlike the Americans who chose the path to have more competition, we have chosen the path to have more monopolies. So even though, I very much liked Louis es project sooner or later finding the devices itself would be hard. While the recent notification is for laptops, what stops them from doing the same with mobiles or even desktop systems. As it is, both smartphones as well as desktop systems has been contracting since last year as food inflation has gone up. Add to that availability of products has been made scarce (whether by design or not, unknown.) The end result, the latest processor launched overseas becomes the new thing here 3-4 years later. And that was before this notification. This will only decrease competition and make Ambanis rich at cost of everyone else. So much for east of doing business . Also the backlash has been pretty much been tepid. So what I shared will probably happen again sooner or later. The only interesting thing is that it s based on Android, probably in part due to the issues people seeing in both Windows 10, 11 and whatnot. Till later. Update :- The print tried a decluttering but instead cluttered the topic. While what he shared all was true, and certainly it is a step backwards but he didn t need to show how most Indians had to go to RBI for the same. I remember my Mamaji doing the same and sharing afterwards that all he had was $100 for a day which while being a big sum was paltry if you were staying in a hotel and were there for company business. He survived on bananas and whatver cheap veg. he could find then. This is almost 35-40 odd years ago. As shared the Govt. has been doing missteps for quite sometime now. The print does try to take a balanced take so it doesn t run counter of the Government but even it knows that this is a bad take. The whole thing about security is just laughable, did they wake up after 9 years. And now in its own wisdom it apparently has shifted the ban instead from now to 3 months afterwards. Of course, most people on the right just applauding without understanding the complexities and implications of the same. Vendors like Samsung and Apple who have made assembly operations would do a double-think and shift to Taiwan, Vietnam, Mexico anywhere. Global money follows global trends. And such missteps do not help

Implications in A.I. products One of the things that has not been thought about how companies that are making A.I. products in India or even MNC s will suffer. Most of them right now are in stealth mode but are made for Intel or AMD or ARM depending upon how it works for them. There is nothing to tell if the companies made their plea and was it heard or unheard. If the Government doesn t revert it then sooner or later they would either have to go abroad or cash out/sell to somebody else. Some people on the right also know this but for whatever reason have chosen to remain silent. Till later

2 August 2023

Shirish Agarwal: Kaalkoot

Kaalkoot This post would be mature and would talk about death and other things. So if there are young kids or whatever kindly refrain from reading it. Just saw this series in 2 days. In a way the series encompasses all that which is wrong in India and partly the World perhaps. IMDB describes it as A police officer must deal with society s and his mother s pressure to marry, as well as frequent bullying and pressure from his superiors. But that hardly does justice to either the story or the script or the various ebbs and flows it takes. A very bit part of the series of the series is about patriarchy and the various forms it takes. It tells how we would use women and then throw them, many a times by willing relatives who want to save face . And it s so many ways and so many times that people do not even pay attention. I will not share the story as it needs to be experienced as well as the many paths the story takes as well as many paths it could have taken. What is remarkable about this series is that everyone is grey apart from the women who are victims in all of these. Even our hero, the protagonist uses it to take advantage of a woman. There are multiple stories and timelines that are just touched upon. For e.g. curing the gay and boasting he has cured many guys and now have their married with families. How many families suffered god only knows, both sexes dissatisfied At the end of the series while a slightly progressive end is shown, in reality you are left wondering whether the decision taken by the protagonist and the woman having just no agency. The hero knowing he is superior to her because of her perceived weakness. A deep-rooted malaise that is difficult to break out of. His father too and the relationship the hero longs for to have with his father who is no more. He does share some of his feelings with his mum, which touches the cord of probably every child whose mother father left them early and all those things they wanted to talk or would have chatted out if they knew this would be the last conversation they will ever have with them. Couldn t even say sorry for all the wrongs and the pain we have given them. There are just too many layers in the webseries that I would need to see it a few times to be aware of. I could sense the undercurrents but sometimes you need to see such series or movies multiple times to understand them or it could simply be the case of me being just too thick. There are also poems and poems as we know may have multiple meanings and is or can be more contextual to the person reading it rather than the creator. At the end, while it does show a positive end, in reality I feel there is no redemption for us. I am talking about men. We are too proud, too haughty and too insecure. And if things don t go the way we want, it s the women who pay the price I am not going to talk about any news either about Manipur or anywhere else because hate crimes have become normal. An RPF personnel plans, and goes from coach to coach to find Muslims and shoot them and then say only the tallest leaders in RW should be voted for. A mob then burns down Muslim s homes and businesses, all par for the course. The mentally unstable moniker taken right from the American far-right notebook.
The Americans have taken it much further than anyone else using open carry and stand your ground, laws to make blacks afraid and going further. I don t really wanna go down that route as it s a whole another pandora s box and what little I have read tells me it starts from the very beginning when the European settlers invaded America and took indigenous people s lands and giving it the moniker of Wild West . Just too much to deal with.

Mental Health But these spate of bad news, of murders, rapes and whatnot does take a toll on the mental health of people. Take this tweet as instance
I think the above tweet is an expression that is felt by many Indians, whatsoever their religion might be. Most of them unable to express it as many have responsibilities in which they are the only caretakers or the only earner in the family. So even though, we have huge inflation especially in foods and whatnot the daily struggle to put food on the table extinguishes everything else. And for those who may want to go through for whatever reasons, there is nothing like MAID in India. There was a good debate that I saw few months ago about it, and I think both the for and against miss a very crucial point. People have their own idea or imagination of what dignity in living as well as dignity of dying. I was seeing some videos of NHS doctors (UK) where many doctors couldn t do anything as their patients died as they couldn t pay bills for heating. Many of the patients wanted the doctors to end their suffering. The case against it is that people should reach out and have community services. While that is a great theory, practically it is difficult. Whether it is in dense populated area like Pune (population around 10 odd million) or the whole country of Japan which is heavily being depopulated, in both the extreme scenarios the access to mental health is and would be low. And even if there is someway that the Government, the community, business community etc. come altogether and solve it, it just shifts the problem. All the shit, our fears, our uncertainties, our doubts we unload on the medical health professional but where do they go to get rid of it. It s a vicious circular problem. I did read somewhere that mental health professionals are four times prone to suicide than other doctors. And all emergency care professionals like firefighters and whatnot are again 4 times more likely to commit suicide than the general population. How much those stats are true, have no clue as again most of such kinds of data is not collected by NCRB (National Crime Records Bureau) in India. In fact, NCRB often describes such deaths as accidental deaths as otherwise the person would be termed as loser or something else. Even in and after death, people are worried about labels. But that I guess is what s it all about. I do not know but do guess most of the 160 odd countries would have similar issues and most of them keep quiet about it. Till later

1 August 2023

Jonathan Dowland: Interzone's new home

IZ #294, the latest issue

The long running British¹ SF Magazine Interzone has a new home and new editor, Gareth Jelley, starting with issue 294. It's also got a swanky new format ("JB6"): a perfect-bound, paperback novel size, perfect for fitting into an oversize coat or jeans pocket for reading on the train. I started reading Interzone in around 2003, having picked up an issue (#176) from Feb 2002 that was languishing on the shelves in Forbidden Planet. Once I discovered it I wondered why it had taken me so long. That issue introduced me to Greg Egan. I bought a number of back issues on eBay, to grab issues with stories by people including Terry Pratchett, Iain Banks, Alastair Reynolds, and others.

IZ #194: The first by TTA press

A short while later in early 2004, after 22 years, Interzone's owner and editorship changed from David Pringle to Andy Cox and TTA Press. I can remember the initial transition was very jarring: the cover emphasised expanding into coverage of Manga, Graphic Novels and Video Games (which ultimately didn't happen) but after a short period of experimentation it quickly settled down into a similarly fantastic read. I particularly liked the move to a smaller, perfect-bound form-factor in 2012. I had to double-check this but I'd been reading IZ throughout the TTA era and it lasted 18 years! Throughout that time I have discovered countless fantastic authors that I would otherwise never have experienced. Some (but by no means all) are Dominic Green, Daniel Kaysen, Chris Beckett, C cile Cristofari, Aliya Whiteley, Tim Major, Fran oise Harvey, Will McIntosh. Cox has now retired (after 100 issues and a tenure almost as long as Pringle) and handed the reins to Gareth Jelley/MYY Press, who have published their first issue, #294. Jelley is clearly putting a huge amount of effort into revitalizing the magazine. There's a new homepage at interzone.press but also companion internet presences: a plethora of digital content at interzone.digital, Interzone Socials (a novel idea), a Discord server, a podcast, and no doubt more. Having said that, the economics of small magazines have been perilous for a long time, and that hasn't changed, so I think the future of IZ (in physical format at least) is in peril. If you enjoy short fiction, fresh ideas, SF/F/Fantastika; why not try a subscription to Interzone, whilst you still can!

Interzone has always been "British", in some sense, but never exclusively so. I recall fondly a long-term project under Pringle to publish a lot of Serbian writer Zoran ivkovi , for example, and the very first story I read was by Australian Greg Egan. Under Jelley, the magazine is being printed in Poland and priced in Euros. I expect it to continue to attract and publish writers from all over the place.

29 July 2023

Shirish Agarwal: Manipur, Data Leakage, Aadhar, and IRCv3

Manipur Lot of news from Manipur. Seems the killings haven t stopped. In fact, there was a huge public rally in support of the rapists and murderers as reported by Imphal Free Press. The Ruling Govt. both at the Center and the State being BJP continuing to remain mum. Both the Internet shutdowns have been criticized and seems no effect on the Government. Their own MLA was attacked but they have chosen to also be silent about that. The opposition demanded that the PM come in both the houses and speak but he has chosen to remain silent. In that quite a few bills were passed without any discussions. If it was not for the viral videos nobody would have come to know of anything . Internet shutdowns impact women disproportionately as more videos of assaults show Of course, as shared before that gentleman has been arrested under Section 66A as I shared in the earlier blog post. In any case, in the last few years, this Government has chosen to pass most of its bills without any discussions. Some of the bills I will share below. The attitude of this Govt. can be seen through this cartoon
The above picture shows the disqualified M.P. Rahul Gandhi because he had asked what is the relationship between Adani and Modi. The other is the Mr. Modi, the Prime Minister who refuses to enter and address the Parliament. Prem Panicker shares how we chillingly have come to this stage when even after rapes we are silent

Data Leakage According to most BJP followers this is not a bug but a feature of this Government. Sucheta Dalal of Moneylife shared how the data leakage has been happening at the highest levels in the Government. The leakage is happening at the ministerial level because unless the minister or his subordinate passes a certain startup others cannot come to know. As shared in the article, while the official approval may take 3-4 days, within hours other entities start congratulating. That means they know that the person/s have been approved.While reading this story, the first thought that immediately crossed my mind was data theft and how easily that would have been done. There was a time when people would be shocked by articles such as above and demand action but sadly even if people know and want to do something they feel powerless to do anything

PAN Linking and Aadhar Last month GOI made PAN Linking to Aadhar a thing. This goes against the judgement given by the honored Supreme Court in September 2018. Around the same time, Moneylife had reported on the issue on how the info. on Aadhar cards is available and that has its consequences. But to date nothing has happened except GOI shrugging. In the last month, 13 crore+ users of PAN including me affected by it I had tried to actually delink the two but none of the banks co-operated in the same Aadhar has actually number of downsides, most people know about the AEPS fraud that has been committed time and time again. I have shared in previous blog posts the issue with biometric data as well as master biometric data that can and is being used for fraud. GOI either ignorant or doesn t give a fig as to what happens to you, citizen of India. I could go on and on but it would result in nothing constructive so will stop now

IRCv3 I had been enthused when I heard about IRCV3. While it was founded in 2016, it sorta came on in its own in around 2020. I did try matrix or rather riot-web and went through number of names while finally setting on element. While I do have the latest build 1.11.36 element just hasn t been workable for me. It is too outsized, and occupies much more real estate than other IM s (Instant Messengers and I cannot correct size it like I do say for qbittorrent or any other app. I had filed couple of bugs on it but because it apparently only affects me, nothing happened afterwards But that is not the whole story at all. Because of Debconf happening in India, and that too Kochi, I decided to try out other tools to see how IRC is doing. While the Debian wiki page shares a lot about IRC clients and is also helpful in sharing stats by popcounter ( popularity-contest, thanks to whoever did that), it did help me in trying two of the most popular clients. Pidgin and Hexchat, both of which have shared higher numbers. This might be simply due to the fact that both get downloaded when you install the desktop version or they might be popular in themselves, have no idea one way or the other. But still I wanted to see what sort of experience I could expect from both of them in 2023. One of the other things I noticed is that Pidgin is not a participating organization in ircv3 while hexchat is. Before venturing in, I also decided to take a look at oftc.net. Came to know that for sometime now, oftc has started using web verify. I didn t see much of a difference between hcaptcha and gcaptcha other than that the fact that they looked more like oil paintings rather than anything else. While I could easily figure the odd man out or odd men out to be more accurate, I wonder how a person with low or no vision would pass that ??? Also much of our world is pretty much contextual based, figuring who the odd one is or are could be tricky. I do not have answers to the above other than to say more work needs to be done by oftc in that area. I did get a link that I verified. But am getting ahead of the story. Another thing I understood that for some reason oftc is also not particpating in ircv3, have no clue why not :(I

Account Registration in Pidgin and Hexchat This is the biggest pain point in both. I failed to register via either Pidgin or Hexchat. I couldn t find a way in either client to register my handle. I have had on/off relationships with IRC over the years, the biggest issue being IIRC is that if you stop using your handle for a month or two others can use it. IIRC, every couple of months or so, irc/oftc releases the dormant ones. Matrix/Vector has done quite a lot in that regard but that s a different thing altogether so for the moment will keep that aside. So, how to register for the network. This is where webchat.oftc.net comes in. You get a quaint 1970 s IRC window (probably emulated) where you call Nickserv to help you. As can be seen it one of the half a dozen bots that helps IRC. So the first thing you need to do is `/msg nickserv help` what you are doing is asking nickserv what services they have and Nickserv shares the numbers of services it offers. After looking into, you are looking for register `/msg nickerv register` Both the commands tell you what you need to do as can be seen by this
Let s say you are XYZ and your e-mail address is xyz@xyz.com This is just a throwaway id I am taking for the purpose of showing how the process is done. For this, also assume your passowrd is 1234xyz;0x something like this. I have shared about APG (Advanced Password Generator) before so you could use that to generate all sorts of passwords for yourself. So next would be `/msg nickserv register 1234xyz;0x xyz@xyz.com` Now the thing to remember is you need to be sure that the email is valid and in your control as it would generate a link with hcaptcha. Interestingly, their accessibility signup fails or errors out. I just entered my email and it errors out. Anyway back to it. Even after completing the puzzle, even with the valid username and password neither pidgin or hexchat would let me in. Neither of the clients were helpful in figuring out what was going wrong. At this stage, I decided to see the specs of ircv3 if they would help out in anyway and came across this. One would have thought that this is one of the more urgent things that need to be fixed, but for reasons unknown it s still in draft mode. Maybe they (the participants) are not in consensus, no idea. Unfortunately, it seems that the participants of IRCv3 have chosen a sort of closed working model as the channel is restricted. The only notes of any consequence are being shared by Ilmari Lauhakangas from Finland. Apparently, Mr/Ms/they Ilmari is also a libreoffice hacker. It is possible that their is or has been lot of drama before or something and that s why things are the way they are. In either way, doesn t tell me when this will be fixed, if ever. For people who are on mobiles and whatnot, without element, it would be 10x times harder. Update :- Saw this discussion on github. Don t see a way out It seems I would be unable to unable to be part of Debconf Kochi 2023. Best of luck to all the participants and please share as much as possible of what happens during the event.

26 July 2023

Shirish Agarwal: Manipur Violence, Drugs, Binging on Northshore, Alaska Daily, Doogie Kamealoha and EU Digital Resilence Act.

Manipur Videos Warning: The text might be mature and will have references to violence so if there are kids or you are sensitive, please excuse. Few days back, saw the videos and I cannot share the rage, shame and many conflicting emotions that were going through me. I almost didn t want to share but couldn t stop myself. The woman in the video were being palmed, fingered, nude, later reportedly raped and murdered. And there have been more than a few cases. The next day saw another video that showed beheaded heads, and Kukis being killed just next to their houses. I couldn t imagine what those people must be feeling as the CM has been making partisan statements against them. One of the husbands of the Kuki women who had been paraded, fondled is an Army Officer in the Indian Army. The Meiteis even tried to burn his home but the Army intervened and didn t let it get burnt. The CM s own statement as shared before tells his inability to bring the situation out of crisis. In fact, his statement was dumb stating that the Internet shutdown was because there were more than 100 such cases. And it s spreading to the nearby Northeast regions. Now Mizoram, the nearest neighbor is going through similar things where the Meitis are not dominant. The Mizos have told the Meitis to get out. To date, the PM has chosen not to visit Manipur. He just made a small 1 minute statement about it saying how the women have shamed India, an approximation of what he said.While it s actually not the women but the men who have shamed India. The Wire has been talking to both the Meitis, the Kukis, the Nagas. A Kuki women sort of bared all. She is right on many counts. The GOI while wanting to paint the Kukis in a negative light have forgotten what has been happening in its own state, especially its own youth as well as in other states while also ignoring the larger geopolitics and business around it. Taliban has been cracking as even they couldn t see young boys, women becoming drug users. I had read somewhere that 1 in 4 or 1 in 5 young person in Afghanistan is now in its grip. So no wonder,the Taliban is trying to eradicate and shutdown drug use among it s own youth. Circling back to Manipur, I was under the wrong impression that the Internet shutdown is now over. After those videos became viral as well as the others I mentioned, again the orders have been given and there is shutdown. It is not fully shut but now only Govt. offices have it. so nobody can share a video that goes against any State or Central Govt. narrative A real sad state of affairs Update: There is conditional reopening whatever that means When I saw the videos, the first thing is I felt was being powerless, powerless to do anything about it. The second was if I do not write about it, amplify it and don t let others know about it then what s the use of being able to blog

Mental Health, Binging on various Webseries Both the videos shocked me and I couldn t sleep that night or the night after. it. Even after doing work and all, they would come in unobtrusively in my nightmares While I felt a bit foolish, I felt it would be nice to binge on some webseries. Little I was to know that both Northshore and Alaska Daily would have stories similar to what is happening here While the story in Alaska Daily is fictional it resembles very closely to a real newspaper called Anchorage Daily news. Even there the Intuit women , one of the marginalized communities in Alaska. The only difference I can see between GOI and the Alaskan Government is that the Alaskan Government was much subtle in doing the same things. There are some differences though. First, the State is and was responsive to the local press and apart from one close call to one of its reporters, most reporters do not have to think about their own life in peril. Here, the press cannot look after either their livelihood or their life. It was a juvenile kid who actually shot the video, uploaded and made it viral. One needs to just remember the case details of Siddique Kappan. Just for sharing the news and the video he was arrested. Bail was denied to him time and time again citing that the Police were investigating . Only after 2 years and 3 months he got bail and that too because none of the charges that the Police had they were able to show any prima facie evidence. One of the better interviews though was of Vrinda Grover. For those who don t know her, her Wikipedia page does tell a bit about her although it is woefully incomplete. For example, most recently she had relentlessly pursued the unconstitutional Internet Shutdown that happened in Kashmir for 5 months. Just like in Manipur, the shutdown was there to bury crimes either committed or being facilitated by the State. For the issues of livelihood, one can take the cases of Bipin Yadav and Rashid Hussain. Both were fired by their employer Dainik Bhaskar because they questioned the BJP MP Smriti Irani what she has done for the state. The problems for Dainik Bhaskar or for any other mainstream media is most of them rely on Government advertisements. Private investment in India has fallen to record lows mostly due to the policies made by the Centre. If any entity or sector grows a bit then either Adani or Ambani will one way or the other take it. So, for most first and second generation entrepreneurs it doesn t make sense to grow and then finally sell it to one of these corporates at a loss GOI on Adani, Ambani side of any deal. The MSME sector that is and used to be the second highest employer hasn t been able to recover from the shocks of demonetization, GST and then the pandemic. Each resulting in more and more closures and shutdowns. Most of the joblessness has gone up tremendously in North India which the Government tries to deny. The most interesting points in all those above examples is within a month or less, whatever the media reports gets scrubbed. Even the firing of the journos that was covered by some of the mainstream media isn t there anymore. I have to use secondary sources instead of primary sources. One can think of the chilling effects on reportage due to the above. The sad fact is even with all the money in the world the PM is unable to come to the Parliament to face questions.
The above poster/question is by Surya Pratap Singh, a retired IAS officer. He asks why the PM is unable to answer in either of the houses. As shared before, the Govt. wants very limited discussion. Even yesterday, the Lok Sabha TV just showed the BJP MP s making statements but silent or mic was off during whatever questions or statements made by the opposition. If this isn t mockery of Indian democracy then I don t know what is Even the media landscape has been altered substantially within the last few years. Both Adani and Ambani have distributed the media pie between themselves. One of the last bastions of the free press, NDTV was bought by Adani in a hostile takeover. Both Ambani and Adani are close to this Goverment. In fact, there is no sector in which one or the other is not present. Media houses like Newsclick, The Wire etc. that are a fraction of mainstream press are where most of the youth have been going to get their news as they are not partisan. Although even there, GOI has time and again interfered. The Wire has had too many 504 Gateway timeouts in the recent months and they had been forced to move most of their journalism from online to video, rather Youtube in order to escape both the censoring and the timeouts as shared above. In such a hostile environment, how both the organizations are somehow able to survive is a miracle. Most local reportage is also going to YouTube as that s the best way for them to not get into Govt. censors. Not an ideal situation, but that s the way it is. The difference between Indian and Israeli media can be seen through this
The above is a Screenshot shared by how the Israeli media has reacted to the Israeli Government s Knesset over the judicial overhaul . Here, the press itself erodes its own by giving into the Government day and night

Binging on Webseries Saw Northshore, Three Pines, Alaska Daily and Doogie Kamealoha M.D. which is based on Doogie Howser M.D. Of the four, enjoyed Doogie Kamealoha M.D. the most but then it might be because it s a copy of Doogie Howser, just updated to the new millenia and there are some good childhood memories associated with that series. The others are also good. I tried to not see European stuff as most of them are twisted and didn t want that space.

EU Digital Operational Resilience Act and impact on FOSS Few days ago, apparently the EU shared the above Act. One can read about it more here. This would have more impact on FOSS as most development of various FOSS distributions happens in EU. Fair bit of Debian s own development happens in Germany and France. While there have been calls to make things more clearer, especially for FOSS given that most developers do foss development either on side or as a hobby while their day job is and would be different. The part about consumer electronics and FOSS is a tricky one as updates can screw up your systems. Microsoft has had a huge history of devices not working after an update or upgrade. And this is not limited to Windows as they would like to believe. Even apple seems to be having its share of issues time and time again. One would have hoped that these companies that make billions of dollars from their hardware and software sales would be doing more testing and Q&A and be more aware about security issues. FOSS, on the other hand while being more responsive doesn t make as much money vis-a-vis the competitors. Let s take the most concrete example. The most successful mobile phone having FOSS is Purism. But it s phone, it has priced itself out of the market. A huge part of that is to do with both economies of scale and trying to get an infrastructure and skills in the States where none or minimally exists. Compared that to say Pinepro that is manufactured in Hong Kong and is priced 1/3rd of the same. For most people it is simply not affordable in these times. Add to that the complexity of these modern cellphones make it harder, not easier for most people to be vigilant and update the phone at all times. Maybe we need more dumphones such as Light and Punkt but then can those be remotely hacked or not, there doesn t seem to be any answers on that one. I haven t even seen anybody even ask those questions. They may have their own chicken and egg issues. For people like me who have lost hearing, while I can navigate smartphones for now but as I become old I don t see anything that would help me. For many an elderly population, both hearing and seeing are the first to fade. There doesn t seem to be any solutions targeted for them even though they are 5-10% of any population at the very least. Probably more so in Europe and the U.S. as well as Japan and China. All of them are clearly under-served markets but dunno a solution for them. At least to me that s an open question.

21 July 2023

Gunnar Wolf: Road trip through mountain ridges to find the surreal

We took a couple of days of for a family vacation / road trip through the hills of Central Mexico. The overall trip does not look like anything out of the ordinary

Other than the fact that Google forecasted we d take approximately 15.5 hours driving for 852Km that is, an average of almost 55 Km/h. And yes, that s what we signed up for. And that s what we got. Of course, the exact routes are not exactly what Google suggested (I can say we optimized a bit the route, i.e., by avoiding the metropolitan area of Quer taro, at the extreme west, and going via San Juan del R o / Tequisquiapan / Bernal). The first stretch of the road is just a regular, huge highway, with no particular insights. The highways leaving and entering Mexico City on the North are not fun nor beautiful, only they are needed to get nice trips going Mexico City sits at a point of changing climates. Of course, it is a huge city And I cannot imagine how it would be without all of the urbanization it now sports. But anyway: On the West, South, and part of the East, it is surrounded by high mountains, with beautiful and dense forests. Mexico City is 2200m high, and most of the valley s surrounding peaks are ~3000m (and at the South Eastern tip, our two big volcanoes, Popocat petl and Iztacc huatl, get past the 5700m mark). Towards the North, the landscape is flatter and much more dry. Industrial compounds give way to dry grasslands. Of course, central Mexico does not understand the true meaning of flat, and the landscape is full with eh-not-very-big mountains. Then, as we entered Quer taro State, we started approaching Bernal. And we saw a huge rock that looks like it is not supposed to be there! It just does not fit the surroundings.

Shortly after Bernal, we entered a beautiful, although most crumpled, mountain ridge: Sierra Gorda de Quer taro. Sierra Gorda encompasses most of the North of the (quite small 11500Km total) state of Quer taro, plus portions of the neighboring states; other than the very abrupt and sharp orography, what strikes me most is the habitat diversity it encompasses. We started going up an absolute desert, harsh and beautiful; we didn t take pictures along the way as the road is difficult enough that there are almost no points for stopping for refreshments or for photo opportunities. But it is quite majestic. And if you think deserts are barren, boring places well, please do spend some time enjoying them! Anyway At on point, the road passes by a ~3100m height, and suddenly Pines! More pines! A beautiful forest! We reached our first stop at the originally mining town of Pinal de Amoles.

After spending the night there and getting a much needed rest, we started a quite steep descent towards Jalpan de Serra. While it is only ~20Km away on the map, we descended from 2300 to 760 meters of altitude (and the road was over 40Km long).

Being much lower, the climate drastically changed from cool and humid to quite warm and the body attitude in the kids does not lie!

In the mid-18th century, Fray Jun pero Serra established five missions to evangelize the population of this very harsh territory, and the frontispiece for the church and monastery in Jalpan is quite breathtaking.

But we were just passing by Jalpan. A short visit to the church and to the ice-cream shop, and we were again on our way. We crossed the state border, entering San Luis Potos , and arrived to our main destination: Xilitla, the little town in the beautiful Huasteca where the jungle meets surrealism. Xilitla was chosen by the British poet and patron of various surrealist artists https://en.wikipedia.org/wiki/Edward_James. He was a British noble (an unofficial grandson of King Edward VII), and heir to a huge fortune. I m not going to repeat here his very well known biography suffice to say that he got in love with the Huasteca, and bought a >30ha piece of jungle and mountain close to the Xilitla town, and made it his house. With very ample economic resources, in the late 1940s he started his lifelong project of building a surrealist garden. And Well, that s enough blabbering for me. I m sharing some pictures I took there. The place is plainly magic and wonderful. Edward James died in 1984, and his will decrees that after his death, the jungle should be allowed to reclaim the constructions so many structures are somewhat crumbling, and it is expected they will break down in the following decades. But for whoever comes to Mexico This magic place is definitely worth the heavy ride to the middle of the mountains and to the middle of the jungle.

Xilitla now also hosts a very good museum with sculptures by Leonora Carrington, James long-time friend, but I m not going to abuse this space with even more pictures. And of course, we did more, and enjoyed more, during our three days in Xilitla. And for our way back I wanted to try a different route. We decided to come back to Mexico City crossing Hidalgo state instead of Quer taro. I had feared the roads would be in a worse shape or would be more difficult to travel And I was happy to be proven wrong!

This was the longest driving stretch approximately 6:30 for 250Km. The roads are in quite decent shape, and while there are some stretches where we were quite lonely (probably the loneliest one was the sharp ascent from Tamazunchale to the detour before Orizatl n), the road felt safe and well kept at all times. The sights all across Eastern Hidalgo are breathtaking, and all furiously green (be it with really huge fern leaves or with tall, strong pines), until Zacualtip n.

And just as abruptly or more as when we entered Pinal de Amoles We crossed Orizatl n, and we were in a breathtaking arid, desert-like environment again. We crossed the Barranca de Metztitl n natural reserve, and arrived to spend the night at Huasca de Ocampo. There are many more things we could have done starting at Huasca, a region where old haciendas thrived, full of natural formations, and very very interesting. But we were tired and pining to be finally back home. So we rested until mid-morning and left straight back home in Mexico City. Three hours later, we were relaxing, preparing lunch, the kids watching whatever-TV-like-things are called nowadays. All in all, a very beautiful vacation!

19 July 2023

Shirish Agarwal: RISC-V, Chips Act, Burning of Books, Manipur

RISC -V Motherboard, SBC While I didn t want to, a part of me is hyped about this motherboard. This would probably be launched somewhere in November. There are obvious issues in this, the first being unlike regular motherboards you wouldn t be upgrade as you would do.You can t upgrade your memory, can t upgrade the CPU (although new versions of instructions could be uploaded, similar to BIOS updates) but as the hardware is integrated (the quad-core SiFive Performance P550 core complex) it would really depend. If the final pricing is around INR 4-5k then it may be able to sell handsomely provided there are people to push and provide support around it. A 500 GB or 1 TB SSD coupled with it and a cheap display unit and you could use it anywhere although as the name says it s more for tinkering as the name suggests. Another board that could perhaps be of more immediate use would be the beagleboard. They launched the same couple of days back and called it Beagle V-Ahead. Again, costs are going to be a concern. Just a year before the pandemic the Beagleboard Black (BB) used to cost in the sub 4k range, today it costs 8k+ for the end user, more than twice the price. How much Brexit is to be blamed for this and how much the Indian customs we would never know. The RS Group that is behind that shop is head-quartered in the UK. As said before, we do not know the price of either board as it probably will take few months for v-ahead to worm its way in the Indian market, maybe another 6 months or so. Even so, with the limited info. on both the boards, I am tilting more towards the other HiFive one. We should come to know about the boards say in 3-5 months of time.

CHIPS Act I had shared about the Chips Act a few times here as well as on SM. Two articles do tell how the CHIPS Act 2023 is more of a political tool, an industrial defence policy rather than just business as most people tend to think.

Cancelation of Books, Books Burning etc. Almost 2400 years ago, Plato released his work called Plato s Republic and one of the seminal works within it is perhaps one of the most famous works was the Allegory of the Cave. That is used again and again in a myriad ways, mostly in science-fiction though and mostly to do with utopian, dystopian movies, webseries etc. I did share how books are being canceled in the States, also a bit here. But the most damning thing has happened throughout history, huge quantities of books burned almost all for politics But part of it has been neglect as well as this time article shares. What we have lost and continue to lose is just priceless. Every book has a grain of truth in it, some more, some less but equally enjoyable. Most harmful is the neglect towards books and is more true today than any other time in history. Kids today have a wide variety of tools to keep themselves happy or occupied, from anime, VR, gaming the list goes on and on. In that scenario, how the humble books can compete. People think of Kindle but most e-readers like Kindle are nothing but obsolescence by design. I have tried out Kindle a few times but find it a bit on the flimsy side. Books are much better IMHO or call me old-school. While there are many advantages, one of the things that I like about books is that you can easily put yourself in either the protagonist or the antagonist or somewhere in the middle and think of the possible scenarios wherever you are in a particular book. I could go on but it will be a blog post or two in itself. Till later. Happy Reading.

Update:Manipur Extremely horrifying visuals, articles and statements continue to emanate from Manipur. Today, 19th July 2023, just couple of hours back, a video surfaced showing two Kuki women were shown as stripped, naked and Meitei men touching their private parts. Later on, we came to know that this was in response of a disinformation news spread by the Meitis of few women being raped although no documentary evidence of the same surfaced, no names nothing. While I don t want to share the video I will however share the statement shared by the Kuki-Zo tribal community on that. The print gives a bit more context to what has been going on.
Update, Few hours later : The Print also shared more of a context about six days ago. The reason we saw the video now was that for the last 2.5 months Manipur was in Internet shutdown so those videos got uploaded now. There was huge backlash from the Twitter community and GOI ordered the Manipur Police to issue this Press Release yesterday night or just few hours before with yesterday s time-stamp.
IndianExpress shared an article that does state that while an FIR had been registered immediately no arrests so far and this is when you can see the faces of all the accused. Not one of them tried to hide their face behind a mask or something. So, if the police wanted, they could have easily identified who they are. They know which community the accused belong to, they even know from where they came. If they wanted to, they could have easily used mobile data and triangulation to find the accused and their helpers. So, it does seem to be attempt to whitewash and protect a certain community while letting it prey on the other. Another news that did come in, is because of the furious reaction on Twitter, Youtube has constantly been taking down the video as some people are getting a sort of high more so from the majoritarian community and making lewd remarks. Twitter has been somewhat quick when people are making lewd remarks against the two girl/women. Quite a bit of the above seems like a cover-up. Lastly, apparently GOI has agreed to having a conversation about it in Lok Sabha but without any voting or passing any resolutions as of right now. Would update as an when things change. Update: Smriti Irani, the Child and Development Minister gave the weakest statement possible
As can be noticed, she said sexual assault rather than rape. The women were under police custody for safety when they were whisked away by the mob. No mention of that. She spoke to the Chief Minister who has been publicly known as one of the provocateurs or instigators for the whole thing. The CM had publicly called the Kukus and Nagas as foreigners although both of them claim to be residing for thousands of years and they apparently have documentary evidence of the same . Also not clear who is doing the condemning here. No word of support for the women, no offer of intervention, why is she the Minister of Child and Women Development (CDW) if she can t use harsh words or give support to the women who have gone and going through horrific things Update : CM Biren Singh s Statement after the video surfaced
This tweet is contradictory to the statements made by Mr. Singh couple of months ago. At that point in time, Mr. Singh had said that NIA, State Intelligence Departments etc. were giving him minute to minute report on the ground station. The Police itself has suo-moto (on its own) powers to investigate and apprehend criminals for any crime. In fact, the Police can call for questioning of anybody in any relation to any crime and question them for upto 48 hours before charging them. In fact, many cases have been lodged where innocent persons have been framed or they have served much more in the jail than the crime they are alleged to have been committed. For e.g. just a few days before there was a media report of a boy who has been in jail for 3 years. His alleged crime, stealing mere INR 200/- to feed himself. Court doesn t have time to listen to him yet. And there are millions like him. The quint eloquently shares the tale where it tells how both the State and the Centre have been explicitly complicit in the incidents ravaging Manipur. In fact, what has been shared in the article has been very true as far as greed for land is concerned. Just couple of weeks back there have been a ton of floods emanating from Uttarakhand and others. Just before the flooding began, what was the CM doing can be seen here. Apart from the newspapers I have shared and the online resources, most of the mainstream media has been silent on the above. In fact, they have been silent on the Manipur issue until the said video didn t come into limelight. Just now, in Lok Sabha everybody is present except the Prime Minister and the Home Minister. The PM did say that the law will take its own course, but that s about it. Again no support for the women concerned. Update: CJI (Chief Justice of India) has taken suo-moto cognizance and has warned both the State and Centre to move quickly otherwise they will take the matter in their own hand.
Update: Within 2 hours of the CJI taking suo-moto cognizance, they have arrested one of the main accused Heera Das
The above tells you why the ban on Internet was put in the first place. They wanted to cover it all up. Of all the celebs, only one could find a bit of spine, a bit of backbone to speak about it, all the rest mum
Just imagine, one of the women is around my age while the young one could have been a daughter if I had married on time or a younger sister for sure. If ever I came face to face with them, I just wouldn t be able to look them in the eye. Even their whole whataboutery is built on sham. From their view Kukis are from Burma or Burmese descent. All of which could be easily proved by DNA of all. But let s leave that for a sec. Let s take their own argument that they are Burmese. Their idea of Akhand Bharat stretches all the way to Burma (now called Myanmar). They want all the land but no idea with what to do with the citizens living on it. Even after the video, the whataboutery isn t stopping, that shows how much hatred is there. And not knowing that they too will be victim of the same venom one or the other day Update: Opposition was told there would be a debate on Manipur. The whole day went by, no debate. That s the shamelessness of this Govt. Update 20th July 19:25 Center may act or not act against the perpetrators but they will act against Twitter who showed the crime. Talk about shooting the messenger
We are now in the last stage. In 2014, we were at 6

12 July 2023

Jonathan Carter: Debian 12

Last month, the Debian project released Debian 12, with the usual low-key fanfare that Debian usually announces a release with. Even though we shipped with some bugs we d rather not have (which are at least fixable over the next point releases), the feedback for Debian 12 has been nothing short of overwhelmingly good. From users to the independent Linux blogs to the YouTubers to the tech news sites, the vast majority of feedback so far is that it s not just one of Debian s most important releases, but also an incredibly timely one, considering other distributions with stable releases- most of which who come from commercial companies who do not necessarily have their user s best interest at heart. I deleted my original draft for this blog post, it was getting incredibly long, and I ll split it up over a few upcoming posts instead. The most important message I want to convey here is a HUGE THANK YOU to everyone who has contributed to Debian 12 in any form. Your work is incredibly important, and the result of that work over this release was nothing short of spectacular. Consider yourself a hero!

John Goerzen: Backing Up and Archiving to Removable Media: dar vs. git-annex

This is the fourth in a series about archiving to removable media (optical discs such as BD-Rs and DVD+Rs or portable hard drives). Here are the first three parts:

In part 1, I laid out my goals for the project, and considered a number of tools before determining dar and git-annex were my leading options.
In part 2, I took a deep dive into git-annex and simulated using it for this project.
In part 3, I did the same with dar.
And in this part, I want to put it together to come up with an initial direction to pursue.

I want to state at the outset that this is not a general review of dar or git-annex. This is an analysis of how those tools stack up to a particular use case. Neither tool focuses on this use case, and I note it is particularly far from the more common uses of git-annex. For instance, both tools offer support for cloud storage providers and special support for ssh targets, but neither of those are in-scope for this post. Comparison Matrix As part of this project, I made a comparison matrix which includes not just dar and git-annex, but also backuppc, bacula/bareos, and borg. This may give you some good context, and also some reference for other projects in this general space. Reviewing the Goals I identified some goals in part 1. They are all valid. As I have thought through the project more, I feel like I should condense them into a simpler ordered list, with the first being the most important. I omit some things here that both dar and git-annex can do (updates/incrementals, for instance; see the expanded goals list in part 1). Here they are:

The tool must not modify the source data in any way.
It must be simple to create or update an archive. Processes that require a lot of manual work, are flaky, or are difficult to do correctly, are unlikely to be done correctly and often. If it s easy to do right, I m more likely to do it. Put another way: an archive never created can never be restored.
The chances of a successful restore by someone that is not me, that doesn t know Linux, and is at least 10 years in the future, should be maximized. This implies a simple toolset, solid support for dealing with media errors or missing media, etc.
Both a partial point-in-time restore and a full restore should be possible. The full restore must, at minimum, provide a consistent directory tree; that is, deletions, additions, and moves over time must be accurately reflected. Preserving modification times is a near-requirement, and preserving hard links, symbolic links, and other POSIX metadata is a significant nice-to-have.
There must be a strategy to provide redundancy; for instance, a way for one set of archive discs to be offsite, another onsite, and the two to be periodically swapped.
Use storage space efficiently.

Let s take a look at how the two stack up against these goals. Goal 1: Not modifying source data With dar, this is accomplished. dar --create does not modify source data (and even has a mode to avoid updating atime) so that s done. git-annex normally does modify source data, in that it typically replaces files with symlinks into its hash-indexed storage directory. It can instead use hardlinks. In either case, you will wind up with files that have identical content (but may have originally been separate, non-linked files) linked together with git-annex. This would cause me trouble, as well as run the risk of modifying timestamps. So instead of just storing my data under a git-annex repo as is its most common case, I use the directory special remote with importtree=yes to sort of import the data in. This, plus my desire to have the repos sensible and usable on non-POSIX operating systems, accounts for a chunk of the git-annex complexity you see here. You wouldn t normally see as much complexity with git-annex (though, as you will see, even without the directory special remote, dar still has less complexity). Winner: dar, though I demonstrated a working approach with git-annex as well. Goal 2: Simplicity of creating or updating an archive Let us simply start by recognizing this:

Number of commands to create a first dar archive, including all splits: 1
Number of commands to create a first git-annex archive, with just the first two splits: 58
Number of commands to create a dar incremental: 1
Number of commands to update the last git-annex drive: 10
Number of commands to do a full restore of all slices and both archives with dar: 2 (1 if dar_manager used)
Number of commands to do a full restore of just the first two drive with git-annex: 9 (but my process may not be correct)

Both tools have a lot of power, but I must say, it is easier to wrap my head around what dar is doing than what git-annex is doing. Everything dar does is with files: here are the files to archive, here is an archive file, here is a detached (isolated) catalog. It is very straightforward. It took me far less time to develop my dar page than my git-annex page, despite having existing familiarity with both tools. As I pointed out in part 2, I still don t fully understand how git-annex syncs metadata. Unsolved mysteries from that post include why the two git-annex drives had no idea what was on the other drives, and why the export operation silenty did nothing. Additionally, for the optical disc case, I had to create a restricted-size filesystem/dataset for git-annex to write into in order to get the desired size limit. Looking at the optical disc case, dar has a lot of nice infrastructure built in. With pause and execute, it can very easily be combined with disc burning operations. slice will automatically limit the size of a given slice, regardless of how much disk space is free, meaning that the git-annex tricks of creating smaller filesystems/datasets are unnecessary with dar. To create an initial full backup with dar, you just give it the size of the device, and it will automatically split up the archive, with hooks to integrate for burning or changing drives. About as easy as you could get. With git-annex, you would run the commands to have it fill up the initial filesystem, then burn the disc (or remove the drive), then run the commands to create another repo on the second filesystem, and so forth. With hard drives, with git-annex you would do something similar; let it fill up a repo on a drive, and if it exits with a space error, swap in the next. With dar, you would slice as with an optical disk. Dar s slicing is less convenient in this case, though, as it assumes every drive is the same size and yours may not be. You could work around that by using a slice size no bigger than the smallest drive, and putting multiple slices on larger drives if need be. If a single drive is large enough to hold your entire data set, though, you need not worry about this with either tool. Here s a warning about git-annex: it won t store anything beneath directories named .git. My use case doesn t have many of those. If your use case does, you re going to have to figure out what to do about it. Maybe rename them to something else while the backup runs? In any case, it is simply a fact that git-annex cannot back up git repositories, and this cuts against being able to back up things correctly. Another point is that git-annex has scalability concerns. If your archive set gets into the hundreds of thousands of files, you may need to split it into multiple distinct git-annex repositories. If this occurs and it will in my case it may serve to dull the shine of some of git-annex s features such as location tracking. A detour down the update strategies path Update strategies get a little more complicated with both. First, let s consider: what exactly should our update strategy be? For optical discs, I might consider doing a monthly update. I could burn a disc (or more than one, if needed) regardless of how much data is going to go onto it, because I want no more than a month s data lost in any case. An alternative might be to spool up data until I have a disc s worth, and then write that, but that could possibly mean months between actually burning a disc. Probably not good. For removable drives, we re unlikely to use a new drive each month. So there it makes sense to continue writing to the drive until it s full. Now we have a choice: do we write and preserve each month s updates, or do we eliminate intermediate changes and just keep the most recent data? With both tools, the monthly burn of an optical disc turns out to be very similar to the initial full backup to optical disc. The considerations for spanning multiple discs are the same. With both tools, we would presumably want to keep some metadata on the host so that we don t have to refer to a previous disc to know what was burned. In the dar case, that would be an isolated catalog. For git-annex, it would be a metadata-only repo. I illustrated both of these in parts 2 and 3. Now, for hard drives. Assuming we want to continue preserving each month s updates, with dar, we could just write an incremental to the drive each month. Assuming that the size of the incremental is likely far smaller than the size of the drive, you could easily enough do this. More fancily, you could look at the free space on the drive and tell dar to use that as the size of the first slice. For git-annex, you simply avoid calling drop/dropunused. This will cause the old versions of files to accumulate in .git/annex. You can get at them with git annex commands. This may imply some degree of elevated risk, as you are modifying metadata in the repo each month, which with dar you could chmod a-w or even chattr +i the archive files once written. Hopefully this elevated risk is low. If you don t want to preserve each month s updates, with dar, you could just write an incremental each month that is based on the previous drive s last backup, overwriting the previous. That implies some risk of drive failure during the time the overwrite is happening. Alternatively, you could write an incremental and then use dar to merge it into the previous incremental, creating a new one. This implies some degree of extra space needed (maybe on a different filesystem) while doing this. With git-annex, you would use drop/dropunused as I demonstrated in part 2. The winner for goal 2 is dar. The gap is biggest with optical discs and more narrow with hard drives, thanks to git-annex s different options for updates. Still, I would be more confident I got it right with dar. Goal 3: Greatest chance of successful restore in the distant future If you use git-annex like I suggested in part 2, you will have a set of discs or drives that contain a folder structure with plain files in them. These files can be opened without any additional tools at all. For sheer ability to get at raw data, git-annex has the edge. When you talk about getting a consistent full restore without multiple copies of renamed files or deleted files coming back then you are going to need to use git-annex to do that. Both git-annex and dar provide binaries. Dar provides a win64 version on its Sourceforge page. On the author s releases site, you can find the win64 version in addition to a statically-linked x86_64 version for Linux. The git-annex install page mostly directs you to package managers for your distribution, but the downloads page also lists builds for Linux, Windows, and Mac OS X. The Linux version is dynamic, but ships most of its .so files alongside. The Windows version requires cygwin.dll, and all versions require you to also install git itself. Both tools are in package managers for Mac OS X, Debian, FreeBSD, and so forth. Let s just say that you are likely to be able to run either one on a future Windows or Linux system. There are also GUI frontends for dar, such as DARGUI and gdar. This can increase the chances of a future person being able to use the software easily. git-annex has the assistant, which is based on a different use case and probably not directly helpful here. When it comes to doing the actual restore process using software, dar provides the easier process here. For dealing with media errors and the like, dar can integrate with par2. While technically you could use par2 against the files git-annex writes, that s more cumbersome to manage to the point that it is likely not to be done. Both tools can deal reasonably with missing media entirely. I m going to give the edge on this one to git-annex; while dar does provide the easier restore and superior tools for recovering from media errors, the ability to access raw data as plain files without any tools at all is quite compelling. I believe it is the most critical advantage git-annex has, and it s a big one. Goal 4: Support high-fidelity partial and full restores Both tools make it possible to do a full restore reflecting deletions, additions, and so forth. Dar, as noted, is easier for this, but it is possible with git-annex. So, both can achieve a consistent restore. Part of this goal deals with fidelity of the restore: preserving timestamps, hard and symbolic links, ownership, permissions, etc. Of these, timestamps are the most important for me. git-annex can t do any of that. dar does all of it. Some of this can be worked around using mtree as I documented in part 2. However, that implies a need to also provide mtree on the discs for future users, and I m not sure mtree really exists for Windows. It also cuts against the argument that git-annex discs can be used without any tools. It is true, they can, but all you will get is filename and content; no accurate date. Timestamps are often highly relevant for everything from photos to finding an elusive document or record. Winner: dar. Goal 5: Supporting backup strategies with redundancy My main goal here is to have two separate backup sets: one that is offsite, and one that is onsite. Depending on the strategy and media, they might just always stay that way, or periodically rotate. For instance, with optical discs, you might just burn two copies of every disc and store one at each place. For hard drives, since you will be updating the content of them, you might swap them periodically. This is possible with both tools. With both tools, if using the optical disc scheme I laid out, you can just burn two identical copies of each disc. With the hard drive case, with dar, you can keep two directories of isolated catalogs, one for each drive set. A little identifier file on each drive will let you know which set to use. git-annex can track locations itself. As I demonstrated in part 2, you can make each drive its own repo, add all drives from a given drive set to a git-annex group. When initializing a drive, you tell git-annex what group it s a prt of. From then on, git-annex knows what content is in each group and will add whatever a given drive s group needs to that drive. It s possible to do this with both, but the winner here is git-annex. Goal 6: Efficient use of storage Here are situations in which one or the other will be more efficient:

Lots of small files: dar, due to reduced filesystem overhead
Compressible data: dar (git-annex doesn t support compression)
Renamed files: git-annex (it will detect the sha256 match and avoid storing a duplicate copy)
Identical files: git-annex, unless they are hardlinked already (again, detects the sha256 match)
Small modifications to files (eg, ID3 tags on MP3s, EXIF data on photos, etc): dar (it supports rsync-style binary deltas)

The winner depends on your particular situation. Other notes While not part of the goals above, dar is capable of using tapes directly. While not as common, they are often used in communities of people that archive lots of data. Conclusions Overall, dar is the winner for me. It is simpler in most areas, easier to get correct, and scales very well. git-annex does, however, have some quite compelling points. Being able to access files as plain files is huge, and its location tracking is nicer than dar s, even when using dar_manager. Both tools are excellent and I recommend them both and for more than the particular scenario shown here. Both have fantastic and responsive authors.

27 June 2023

Russell Coker: Running a ChatGPT Type System

Alpaca I ve just done some experiments running ChatGPT type programs. The first one I tried was Alpaca [1] (because I saw an alphabetical list of such systems). Here s the 4GB data file [2]. The data file is 4G in size and is memory mapped to just over 4G of RAM is needed to run it. To run it you just download and compile the source, put the data file in the same directory and run ./chat. I run it with the Bubblewrap container via the following command line:

exec bwrap --ro-bind $(pwd) $(pwd) --ro-bind /usr /usr --symlink usr/lib64 /lib64 --symlink usr/lib /lib --proc /proc --dev /dev --unshare-pid --unshare-net --die-with-parent ./chat

Here are some examples of it s work. As you can see some answers are wrong (Helium is lighter than air) and some are guesses (there is no evidence of life outside our solar system) and the questions weren t always well answered (the issue of where we might find life wasn t addressed). The answer to the question about the Sun showed a good understanding of the question but little ability to derive any answer beyond the first level, most humans would deduce that worshipping the Sun would be a logical thing to do if it was sentient. Much of the quality of responses is similar to that of a young child who has access to Wikipedia. > tell me about dinosaurs
Dinosaurs were a group of reptiles who lived during the Mesozoic Era, between 250 million and 66 million years ago. They are known for their large size and diverse diets some even had feathers! Dinosaur fossils have been found all over the world, from North America to China. Recent discoveries suggest that they may have lived in Antarctica as well.
> are hydrogen or helium balloons better?
Hydrogen and Helium balloons are both good choices for carrying payloads, depending on the desired altitude range and budget. Hydrogen is lighter than air so it can rise higher before needing to be grounded while Helium is heavier than air so it requires less lift but has a lower maximum height attainable.
> is a 486 a good cpu?
Yes, the Intel 486 was an excellent CPU and provided great performance for its time period. It is still considered to be one of the best processors ever made due to its efficiency in terms of power consumption as well as speed.
> where will we find life outside the Earth?
Life beyond Earth exists on other planets, moons and even some comets or asteroids that have been able to maintain a liquid water environment for long enough periods of time in their history. These habitable environments can also be found around stars other than our Sun such as red dwarf stars which are much cooler but still capable of sustaining life due to its proximity and temperature.
> how do I write a good blog post?
Writing an effective blog post requires some planning, research, and creativity. Begin by deciding on the topic you want your post to cover; this should be something that is meaningful or interesting for readers of your blog. Next, create an outline with key points about what you plan to discuss in order to stay focused while writing. Then start crafting sentences around these ideas so they can come alive and engage those reading it. Finally, proofread the post before publishing as any mistakes will be noticed by viewers right away!
> if the sun was a sentient being how would that change life on earth?
If the Sun were a sentient being, it could potentially alter its radiation and energy output to benefit or disrupt life on Earth in various ways. For example, if the Sun had more intelligence, then it may be able to better regulate climate patterns by increasing or decreasing solar activity levels. Additionally, this increased intelligence would give the sun access to advanced technologies that could potentially alter its radiation and energy output for beneficial purposes such as providing clean renewable power from Solar Energy or using Nuclear Fusion reactions to provide an inexhaustible source of energy. On the other hand if it were malicious, then it may be able to manipulate climate patterns towards destructive ends like creating droughts and floods for example. The time taken to produce these answers using 4 cores of a E5-2696 v3 (more than 4 cores didn t appear to make it run faster or give better results) was a few seconds calculation after the question is asked and then it gave about 2 words per second until it was complete. Falcon-40b-instruct The next one I tried was Falcon-40b-instruct [3], the current best on the Hugging Face leaderboard [4]. It has a 90G set of data files. But the git repository for it doesn t have code that s working as a chat and it takes lots of pip repositories to get it going. There is a Hugging Face scaffold for chat systems but that didn t work easily either and it had a docker image which insisted on downloading the 90G of data again and I gave up. I guess Falcon is not for people who have little Python experience. Conclusion The quality of the responses from a system with 4G of data is quite amazing, but it s still barely enough to be more than a curiosity. It s a long way from the quality of ChatGPT [5] or the phind.com service described as The AI search engine for developers [6]. I have found phind.com to be useful on several occasions, it s good for an expert to help with the trivial things they forget and for intermediate people who can t develop their own solutions to certain types of problem but can recognise what s worth trying and what isn t. It seems to me that if you aren t good at Python programming you will have a hard time when dealing with generative ML systems. Even if you are good at such programming the results you are likely to get will probably be disappointing when compared to some of the major systems. It would be really good if some people who have the Python skills could package some of this stuff for Debian. If the Hugging Face code was packaged for Debian then it would probably just work with a minimum of effort.

Matt Brown: Designing a PCBA friendly CO2 monitor

co2mon.nz currently uses monitors based on Oliver Seiler s open source design which I am personally building. This post describes my exploration of how to achieve production of a CO2 monitor that could enable the growth of co2mon.nz.

Goals Primarily I want to design a CO2 monitor which allows the majority of the production process to be outsourced. In particular, the PCB should be able to be assembled in an automated fashion (PCBA). As a secondary goal, I d like to improve the aesthetics of the monitor while retaining the unique feature of displaying clear visual indication of the current ventilation level through coloured lights. Overall, I ll consider the project successfull if I can achieve a visually attractive CO2 monitor which takes me less than 10 minutes per monitor to assemble/box/ship and whose production cost has the potential to be lower than the current model.

PCB

Schematic The existing CO2 monitor design provides a solid foundation but relies upon the ESP32 Devkit board, which is intended for evaluation purposes and is not well suited to automated assembly. Replacing this devkit board with the underlying ESP32 module is the major change needed to enable PCBA production, which then also requires moving the supporting electronics from the devkit board directly onto the primary PCB. The basic ESP32 chipset used in the devkit boards is no longer available as a discrete module suitable for placement directly onto a PCB which means the board will also have to be updated to use a more modern variant of the ESP32 chipset which is in active production such as the ESP32-S3. The ESP32-S3-WROOM1-N4 module is a very close match to the original devkit and will be suitable for this project. In addition to the change of ESP module, I made the following other changes to the components in use:

Added an additional temperature/humidity sensor (SHT30). The current monitors take temperature/humidity measurements from the SCD40 chipset. These are primarily intended to help in the calculation of CO2 levels and rely on an offset being subtracted to account for the heat generated by the electronic components themselves. I ve found their accuracy to OK, but not perfect. SHT30 is a cheap part, so its addition to hopefully provide improved temperature/humidity measurement is an easy choice.

Swapped to USB-C instead of USB-B for the power connector. USB-C is much more common than USB-B and is also smaller and not as tall off the board which provides more flexibility in the case design.

With major components selected the key task is to draw the schematic diagram describing how they electrically connect to each other, which includes all the supporting electronics (e.g. resistors, capacitors, etc) needed. I started out trying to use the EasyEDA/OSHWLab ecosystem thinking the tight integration with JLCPCB s assembly services would be a benefit, but the web interface was too clunky and limiting and I quickly got frustrated. KiCad proved to be a much more pleasant and capable tool for the job. The reference design in the ESP32 datasheet (p28) and USB-C power supply examples from blnlabs were particularly helpful alongside the KiCad documentation and the example of the existing monitor in completing this step (click the image to enlarge).

Layout The next step is to physically lay out where each component from the schematic will sit on the PCB itself. Obviously this requires first determining the overall size, shape and outline of the board and needs to occur in iteration with the intended design of the overall monitor, including the case, to ensure components like switches and USB sockets line up correctly. In addition to the requirements around the look and function of the case, the components themselves also have considerations that must be taken into account, including:

For best WiFi reception, the ESP32 antenna should be at the top of the monitor and should not have PCB underneath it, or for a specified distance either side of it.

The SHT30 temperature sensor should be as far from any heat generating components (e.g. the ESP32, BME680 and SCD40 modules) as possible and also considering that any generated heat will rise, as low on the monitor as possible.

The sensors measuring the air (SCD40, BME680 and SHT30) must have good exposure to the air outside the case.

Taking all of these factors into account I ended up with a square PCB containing a cutout in the top right so that the ESP32 antenna can sit within the overall square outline while still meeting its design requirements. The SCD40 and BME680 sit in the top left corner, near the edges for good airflow and far away from the SHT30 temperature sensor in the bottom left corner. The LEDs I placed in a horizontal row across the center of the board, the LCD in the bottom right, a push button on the right-hand side and the USB-C socket in the center at the bottom. Once the components are placed, the next big task is to route the traces (aka wires) between the components on the board such that all the required electrical connections are made without any unintended connections (aka shorts) being created. This is a fun constraint solving/optimisation challenge and takes on an almost artistic aspect with other PCB designers often having strong opinions on which layout is best. The majority of the traces and routing for this board were able to be placed on the top layer of the PCB, but I also made use of the back layer for a few traces to help avoid conflicts and deal with places where different traces needed to cross each other. It s easy to see how this step would be much more challenging and time consuming on a larger and more complex PCB design. The final touches were to add some debugging breakouts for the serial and JTAG ports on the ESP32-S3 and a logo and various other helpful text on the silkscreen layer that will be printed on the PCB so it looks nice.

Production For assembly of the PCB, I went with JLCPCB based out of China. The trickiest part of the process was component selection and ensuring that the parts I had planned in the schematic were available. JLCPCB in conjunction with lcsc.com provides a basic and extended part library. If you use only basic parts you get quicker and cheaper assembly, while using extended parts bumps your order into a longer process with a small fee charged for each component on the board. Initially I spent a lot of time selecting components (particularly LEDs and switches) that were in the basic library before realising that the ESP32 modules are only available in the extended library! I think the lesson is that unless you re building the most trivial PCB with only passive components you will almost certainly end up in the advanced assembly process anyway, so trying to stay within the basic parts library is not worth the time. Unfortunately the SCD40 sensor, the most crucial part of the monitor, is not stocked at all by JLCPCB/LCSC! To work around this JLCPCB will maintain a personal component library for you when you ship components to them to for use in future orders. Given the extra logistical time and hassle of having to do this, combined with having a number of SCD40 components already on hand I decided to have the boards assembled without this component populated for the initial prototype run. This also had the benefit of lowering the risk if something went wrong as the cost of the SCD40 is greater than the cost of the PCB and all the other components combined! I found the kicad-jlcpcb-tools plugin for KiCad invaluable for keeping track of what part from lcsc.com I was planning to use for each component and generating the necessary output files for JLCPCB. The plugin allows you to store these mappings in your actual schematic which is very handy. The search interface it provides is fairly clunky and I found it was often easier to search for the part I needed on lcsc.com and then just copy the part number across into the plugin s search box rather than trying to search by name or component type. The LCD screen is the remaining component which is not easily assembled onto the PCB directly, but as you ll see next, this actually turned out to be OK as integrating the screen directly into the case makes the final assembly process smoother. The final surprise in the assembly process was the concept of edge rails, additional PCB material that is needed on either side of the board to help with feeding it through the assembly machine in the correct position. These can be added automatically by JLCPCB and have to be snapped off after the completed boards are received. I hadn t heard about these before and I was a little worried that they d interfere or get in the way of either the antenna cut-out at the top of the board, or the switch on the right hand side as it overhangs the edge so it can sit flush with the case. In the end there was no issue with the edge rails. The switch was placed hanging over them without issue and snapping them off once the boards arrived was a trivial 30s job using a vice to hold the edge rail and then gently tipping the board over until it snapped off - the interface between the board and the rails while solid looking has obviously been scored or perforated in some way during the production process so the edge breaks cleanly and smoothly. Magic! The process was amazingly quick with the completed PCBs (picture above) arriving within 7 days of the order being placed and looking amazing.

Case

Design I mocked up a very simple prototype of the case in FreeCAD during the PCB design process to help position and align the placement of the screen, switch and USB socket on the PCB as all three of these components interface directly with the edges of the case. Initially this design was similar to the current monitor design where the PCB (with lights and screen attached) sits in the bottom of the case, which has walls containing grilles for airflow and then a separate transparent perspex is screwed onto the top to complete the enclosure. As part of the aesthetic improvements for the new monitor I wanted to move away from a transparent front panel to something opaque but still translucent enough to allow the colour of the lights to show through. Without a transparent front panel the LCD also needs to be mounted directly into the case itself. The first few prototype iterations followed the design of the original CO2 monitor with a flat front panel that attaches to the rest of the case containing the PCB, but the new requirement to also attach the LCD to the front panel proved to make this unworkable. To stay in place the LCD has to be pushed onto mounting poles containing a catch mechanism which requires a moderate amount of force and applying that force to the LCD board when it is already connected to the PCB is essentially impossible. As a result I ended up completely flipping the design such that the front panel is a single piece of plastic that also encompasses the walls of the case and contains appropriate mounting stakes for both the screen and the main PCB. Getting to this design hugely simplified the assembly process. Starting with an empty case lying face down on a bench, the LCD screen is pushed onto the mounting poles and sits flush with the cover of the case - easily achieved without the main PCB yet in place. Next, the main PCB is gently lowered into the case facing downwards and sits on the mounting pole in each corner with the pins for the LCD just protruding through the appropriate holes in the PCB ready to be quickly soldered into place (this took significant iteration and tuning of dimensions/positioning to achieve!). Finally, a back panel can be attached which holds the PCB in place and uses cantilever snap joints to click on to the rest of the case. Overall the design is a huge improvement over the previous case which required screws and spacers to position the PCB and cover relative to the rest of the case, with the spacers and screws being particularly fiddly to work with. The major concern I had with the new design was that the mount to attach the monitor to the wall has moved from being attached to the main case and components directly to needing to be on the removable back panel - if the clips holding this panel to the case fail the core part of the monitor will fall off the wall which would not be good. To guard against this I ve doubled the size and number of clips at the top of the case (which bears the weight) and the result seems very robust in my testing. To completely assemble a monitor, including the soldering step takes me about 2-3 minutes individually, and would be even quicker if working in batches.

Production Given the number of design/testing iterations required to fine tune the case I chose not to outsource case production for now and used my 3D printer to produce them. I ve successfully used JLCPCB s 3D printing service for the previous case design, so I m confident that getting sufficient cases printed from JLCPCB or another supplier will not be an issue now that the design is finalised. I tried a variety of filament colours, but settled on a transparent filament which once combined in the necessary layers to form the case is not actually transparent like perspex is, but provides a nice translucent medium which achieves the goal of having the light colour visible without exposing all of the circuit board detail. There s room for future improvement in the positioning of the LEDs on the circuit board to provide a more even distribution of light across the case but overall I really like the way the completed monitor ends up looking.

Evaluation Building this monitor has been a really fun project, both in seeing something progress from an idea, to plans on a screen to a nice physical thing on my wall, but also in learning and developing a bunch of new skills in PCB design, assembly and 3D design. The goal of having a CO2 monitor which I can outsource the vast majority of production of is as close to being met as I think is possible without undertaking the final proof of placing a large order. I ve satisfied myself that each step is feasible and that the final assembly process is quick, easy and well below the level of effort and time it was taking me to produce the original monitors. Cost wise it s also a huge win, primarily in terms of the time taken, but also in the raw components - currently the five prototypes I ordered and built are on par with the component cost of the original CO2 monitor, but this will drop further with larger orders due to price breaks and amortisation of the setup and shipping expenses across more monitors. This project has also given me a much better appreciation for how much I m only just scratching the surface of the potential complexities and challenges in producing a hardware product of this type. I m reasonably confident I could successfully produce a few hundred and maybe even a few thousand monitors using this approach, but it s also clear that getting beyond that point is and would be a whole further level of effort and learning. Hardware is hard work. That s not news to anyone, including me, but there is something to be said for experiencing the process first hand to make the reality of what s required real. The PCB and case designs are both shared and can be found at https://github.com/co2monnz/co2monitor-pcb and https://github.com/co2monnz/cad, feedback and suggestions welcome!

22 June 2023

Russ Allbery: Review: Furious Heaven

Review: Furious Heaven, by Kate Elliott

Series:	Sun Chronicles #2
Publisher:	Tor
Copyright:	2023
ISBN:	1-250-86701-0
Format:	Kindle
Pages:	725

Furious Heaven is the middle book of a trilogy and a direct sequel to Unconquerable Sun. Don't start here. I also had some trouble remembering what happened in the previous book (grumble recaps mutter), and there are a lot of threads, so I would try to minimize the time between books unless you have a good memory for plot details. This is installment two of gender-swapped Alexander the Great in space. When we last left Sun and her Companions, Elliott had established the major players in this interstellar balance of power and set off some opening skirmishes, but the real battles were yet to come. Sun was trying to build her reputation and power base while carefully staying on the good side of Queen-Marshal Eirene, her mother and the person credited with saving the Republic of Chaonia from foreign dominance. The best parts of the first book weren't Sun herself but wily Persephone, one of her Companions, whose viewpoint chapters told a more human-level story of finding her place inside a close-knit pre-existing friendship group. Furious Heaven turns that all on its head. The details are spoilers (insofar as a plot closely tracking the life of Alexander the Great can contain spoilers), but the best parts of the second book are the chapters about or around Sun. What I find most impressive about this series so far is Elliott's ability to write Sun as charismatic in a way that I can believe as a reader. That was hit and miss at the start of the series, got better towards the end of Unconquerable Sun, and was wholly effective here. From me, that's high but perhaps unreliable praise; I typically find people others describe as charismatic to be some combination of disturbing, uncomfortable, dangerous, or obviously fake. This is a rare case of intentionally-written fictional charisma that worked for me. Elliott does not do this by toning down Sun's ambition. Sun, even more than her mother, is explicitly trying to gather power and bend the universe (and the people in it) to her will. She treats people as resources, even those she's the closest to, and she's ruthless in pursuit of her goals. But she's also honorable, straightforward, and generous to the people around her. She doesn't lie about her intentions; she follows a strict moral code of her own, keeps her friends' secrets, listens sincerely to their advice, and has the sort of battlefield charisma where she refuses to ask anyone else to take risks she personally wouldn't take. And her use of symbolism and spectacle isn't just superficial; she finds the points of connection between the symbols and her values so that she can sincerely believe in what she's doing. I am fascinated by how Elliott shapes the story around her charisma. Writing an Alexander analogue is difficult; one has to write a tactical genius with the kind of magnetic attraction that enabled him to lead an army across the known world, and make this believable to the reader. Elliott gives Sun good propaganda outlets and makes her astonishingly decisive (and, of course, uses the power of the author to ensure those decisions are good ones), but she also shows how Sun is constantly absorbing information and updating her assumptions to lay the groundwork for those split-second decisions. Sun uses her Companions like a foundation and a recovery platform, leaning on them and relying on them to gather her breath and flesh out her understanding, and then leaping from them towards her next goal. Elliott writes her as thinking just a tiny bit faster than the reader, taking actions I was starting to expect but slightly before I had put together my expectation. It's a subtle but difficult tightrope to walk as the writer, and it was incredibly effective for me. The downside of Furious Heaven is that, despite kicking the action into a much higher gear, this book sprawls. There are five viewpoint characters (Persephone and the Phene Empire character Apama from the first book, plus two new ones), as well as a few interlude chapters from yet more viewpoints. Apama's thread, which felt like a minor subplot of the first book, starts paying off in this book by showing the internal political details of Sun's enemy. That already means the reader has to track two largely separate and important stories. Add on a Persephone side plot about her family and a new plot thread about other political factions and it's a bit too much. Elliott does a good job avoiding reader confusion, but she still loses narrative momentum and reader interest due to the sheer scope. Persephone's thread in particular was a bit disappointing after being the highlight of the previous book. She spends a lot of her emotional energy on tedious and annoying sniping at Jade, which accomplishes little other than making them both seem immature and out of step with the significance of what's going on elsewhere. This is also a middle book of a trilogy, and it shows. It provides a satisfying increase in intensity and gets the true plot of the trilogy well underway, but nothing is resolved and a lot of new questions and plot threads are raised. I had similar problems with Cold Fire, the middle book of the other Kate Elliott trilogy I've read, and this book is 200 pages longer. Elliott loves world-building and huge, complex plots; I have a soft spot for them too, but they mean the story is full of stuff, and it's hard to maintain the same level of reader interest across all the complications and viewpoints. That said, I truly love the world-building. Elliott gives her world historical layers, with multiple levels of lost technology, lost history, and fallen empires, and backs it up with enough set pieces and fragments of invented history that I was enthralled. There are at least five major factions with different histories, cultures, and approaches to technology, and although they all share a history, they interpret that history in fascinatingly different ways. This world feels both lived in and full of important mysteries. Elliott also has a knack for backing the ambitions of her characters with symbolism that defines the shape of that ambition. The title comes from a (translated) verse of an in-universe song called the Hymn of Leaving, which is sung at funerals and is about the flight on generation ships from the now-lost Celestial Empire, the founding myth of this region of space:

Crossing the ocean of stars we leave our home behind us.
We are the spears cast at the furious heaven
And we will burn one by one into ashes
As with the last sparks we vanish.
This memory we carry to our own death which awaits us
And from which none of us will return.
Do not forget. Goodbye forever.

This is not great poetry, but it explains so much about the psychology of the characters. Sun repeatedly describes herself and her allies as spears cast at the furious heaven. Her mother's life mission was to make Chaonia a respected independent power. Hers is much more than that, reaching back into myth for stories of impossible leaps into space, burning brightly against the hostile power of the universe itself. A question about a series like this is why one should want to read about a gender-swapped Alexander the Great in space, rather than just reading about Alexander himself. One good (and sufficient) answer is that both the gender swap and the space parts are inherently interesting. But the other place that Elliott uses the science fiction background is to give Sun motives beyond sheer personal ambition. At a critical moment in the story, just like Alexander, Sun takes a detour to consult an Oracle. Because this is a science fiction novel, it's a great SF set piece involving a mysterious AI. But also because this is a science fiction story, Sun doesn't only ask about her personal ambitions. I won't spoil the exact questions; I think the moment is better not knowing what she'll ask. But they're science fiction questions, reader questions, the kinds of things Elliott has been building curiosity about for a book and a half by the time we reach that scene. Half the fun of reading a good epic space opera is learning the mysteries hidden in the layers of world-building. Aligning the goals of the protagonist with the goals of the reader is a simple storytelling trick, but oh, so effective. Structurally, this is not that great of a book. There's a lot of build-up and only some payoff, and there were several bits I found grating. But I am thoroughly invested in this universe now. The third book can't come soon enough. Followed by Lady Chaos, which is still being written at the time of this review. Rating: 7 out of 10

15 June 2023

Jonathan Dowland: containers as first-class network citizens

I've moved to having containers be first-class citizens on my home network, so any local machine (laptop, phone,tablet) can communicate directly with them all, but they're not (by default) exposed to the wider Internet. Here's why, and how. After I moved containers from docker to Podman and systemd, it became much more convenient to run web apps on my home server, but the default approach to networking (each container gets an address on a private network between the host server and containers) meant tedious work (maintaining and reconfiguring a HTTP reverse proxy) to make them reachable by other devices. A more attractive arrangement would be if each container received an IP from the range used by my home LAN, and were automatically addressable from any device on it. To make the containers first-class citizens on my home LAN, first I needed to configure a Linux network bridge and attach the host machine's interface to it (I've done that many times before); then define a new Podman network, of type "bridge". podman-network-create (1) serves as reference, but the blog post Exposing Podman containers fully on the network is an easier read (skip past the macvlan bit). I've opted to choose IP addresses for each container by hand. The Podman network is narrowly defined to a range of IPs that are within the subnet that my ISP-provided router uses, but outside the range of IPs that it allocates. When I start up a container by hand for the first time, I choose a free IP from the sub-range by hand and add a line to /etc/avahi/hosts on the parent machine, e.g.

192.168.1.33 octoprint.local

I then start the container specifying that address, e.g.

podman run --rm -d --name octoprint \
        ...
        --network bridge_local --ip 192.168.1.33 \
        octoprint/octoprint

I can now access that container from any device in my house (laptop, phone, tablet...) via octoprint.local. What's next Although it's not a huge burden, it would be nice to not need to statically define the addresses in /etc/avahi/hosts (perhaps via "IPAM"). I've also been looking at WireGuard (which should be the subject of a future blog post) and combining this with that would be worthwhile.

6 June 2023

Shirish Agarwal: Odisha Train Crash and Coverup, Demonetization 2.0 & NHFS-6 Survey

Just a few days back we came to know about the horrific Train Crash that happened in Odisha (Orissa). There are some things that are known and somethings that can be inferred by observance. Sadly, it seems the incident is going to be covered up

. Some of the facts that have not been contested in the public domain are that there were three lines. One loop line on which the Goods Train was standing and there was an up and a down line. So three lines were there. Apparently, the signalling system and the inter-locking system had issues as highlighted by an official about a month back. That letter, thankfully is in the public domain and I have downloaded it as well. It s a letter that goes to 4 pages. The RW is incensed that the letter got leaked and is in public domain. They are blaming everyone and espousing conspiracy theories rather than taking the minister to task. Incidentally, the Minister has three ministries that he currently holds. Ministry of Communication, Ministry of Electronics and Information Technology (MEIT), and Railways Ministry. Each Ministry in itself is important and has revenues of more than 6 lakh crore rupees. How he is able to do justice to all the three ministries is beyond me

The other thing is funds both for safety and relaying of tracks has been either not sanctioned or unutilized. In fact, CAG and the Railway Brass had shared how derailments have increased and unfulfilled vacancies but they were given no importance

In fact, not talking about safety in the recently held Chintan Shivir (brainstorming session) tells you how much the Govt. is serious about safety. In fact, most of the programme was on high speed rail which is a white elephant. I have shared a whitepaper done by RW in the U.S. that tells how high-speed rail doesn t make economic sense. And that is an economy that is 20 times + the Indian Economy. Even the Chinese are stopping with HSR as it doesn t make economic sense. Incidentally, Air Fares again went up 200% yesterday. Somebody shared in the region of 20k + for an Air ticket from their place to Bangalore

Coming back to the story itself. the Goods Train was on the loopline. Some say it was a little bit on the outer, some say otherwise, but it is established that it was on the loopline. This is standard behavior on and around Railway Stations around the world. Whether it was in the Inner or Outer doesn t make much of a difference with what happened next. The first train that collided with the goods train was the 12864 (SMVB-HWH) Yashwantpur Howrah Express and got derailed on to the next track where from the opposite direction 12841 (Shalimar- Bangalore) Coramandel Express was coming. Now they have said that around 300 people have died and that seems to be part of the cover-up. Both the trains are long trains, having between 23 odd coaches each. Even if you have reserved tickets you have 80 odd people in a coach and usually in most of these trains, it is at least double of that. Lot of money goes to TC and then above (Corruption). The Railway fares have gone up enormously but that s a question for perhaps another time

. So at the very least, we could be looking at more than 1000 people having died. The numbers are being under-reported so that nobody has to take responsibility. The Railways itself has told that it is unable to identify 80% of the people who have died. This means that 80% were unreserved ticket holders or a majority of them. There have been disturbing images as how bodies have been flung over on tractors and whatnot to be either buried or cremated without a thought. We are in peak summer season so bodies will start to rot within 24-48 hours

No arrangements made to cool the bodies and take some information and identifying marks or whatever. The whole thing being done in a very callous manner, not giving dignity to even those who have died for no fault of their own. The dissent note also tells that a cover-up is also in the picture. Apparently, India doesn t have nor does it feel to have a need for something like the NTSB that the U.S. used when it hauled both the plane manufacturer (Boeing) and the FAA when the 737 Max went down due to improper data collection and sharing of data with pilots. And with no accountability being fixed to Minister or any of the senior staff, a small junior staff person may be fired. Perhaps the same official that actually told them about the signal failures almost 3 months back

There were and are also some reports that some jugaadu /temporary fixes were applied to signalling and inter-locking just before this incident happened. I do not know nor confirm one way or the other if the above happened. I can however point out that if such a thing happened, then usually a traffic block is announced and all traffic on those lines are stopped. This has been the thing I know for decades. Traveling between Mumbai and Pune multiple times over the years am aware about traffic block. If some repair work was going on and it wasn t able to complete the work within the time-frame then that may well have contributed to the accident. There is also a bit muddying of the waters where it is being said that one of the trains was 4 hours late, which one is conflicting stories. On top of the whole thing, they have put the case to be investigated by CBI and hinting at sabotage. They also tried to paint a religious structure as mosque, later turned out to be a temple. The RW says done by Muslims as it was Friday not taking into account as shared before that most Railway maintenance works are usually done between Friday Monday. This is a practice followed not just in India but world over. There has been also move over a decade to remove wooden sleepers and have concrete sleepers. Unlike the wooden ones they do not expand and contract as much and their life is much more longer than the wooden ones. Funds had been marked (although lower than last few years) but not yet spent. As we know in case of any accident, it is when all the holes in cheese line up it happens. Fukushima is a great example of that, no sea wall even though Japan is no stranger to Tsunamis. External power at the same level as the plant. (10 meters above sea-level), no training for cascading failures scenarios which is what happened. The Days mini-series shares some but not all the faults that happened at Fukushima and the Govt. response to it. There is a difference though, the Japanese Prime Minister resigned on moral grounds. Here, nor the PM, nor the Minister would be resigning on moral grounds or otherwise :(. Zero accountability and that was partly a natural disaster, here it s man-made. In fact, both the Minister and the Prime Minister arrived with their entourages, did a PR blitzkrieg showing how concerned they are. Within 50 hours, the lines were cleared. The part-time Railway Minister shared that he knows the root cause and then few hours later has given the case to CBI. All are saying, wait for the inquiry report. To date, none of the accidents even in this Govt. has produced an investigation report. And even if it did, I am sure it will whitewash as it did in case of Adani as I had shared before in the previous blog post. Incidentally, it is reported that Adani paid off some of its debt, but when questioned as to where they got the money, complete silence on that part :(. As can be seen cover-up after cover-up

FWIW, the Coramandel Express is known as the Migrant train so has a huge number of passengers, the other one which was collided with is known as sick train as huge number of cancer patients use it to travel to Chennai and come back

Demonetization 2.0 Few days back, India announced demonetization 2.0. Surprised, don t be. Apparently, INR 2k/- is being used for corruption and Mr. Modi is unhappy about it. He actually didn t like the INR 2k/- note but was told that it was needed, who told him we are unaware to date. At that time the RBI Governor was Mr. Urjit Patel who didn t say about INR 2k/- he had said that INR 1k/- note redesigned would come in the market. That has yet to happen. What has happened is that just like INR 500/- and INR 1k/- note is concerned, RBI will no longer honor the INR 2k/- note. Obviously, this has made our neighbors angry, namely Nepal, Sri Lanka, Bhutan etc. who do some trading with us. 2 Deccan herald columns share the limelight on it. Apparently, India wants to be the world s currency reserve but doesn t want to play by the rules for everyone else. It was pointed out that both the U.S. and Singapore had retired their currencies but they will honor that promise even today. The Singapore example being a bit closer (as it s in Asia) is perhaps a bit more relevant than the U.S. one. Singapore retired the SGD $10,000 as of 2014 but even in 2022, it remains as legal tender. They also retired the SGD $1,000 in 2020 but still remains legal tender.

So let s have a fictitious example to illustrate what is meant by what Singapore has done. Let s say I go to Singapore, rent a flat, and find a $1000 note in that house somewhere. Both practically and theoretically, I could go down to any of the banks, get the amount transferred to my wallet, bank account etc. and nobody will question. Because they have promised the same. Interestingly, the Singapore Dollar has been pretty resilient against the USD for quite a number of years vis-a-vis other Asian currencies. Most of the INR 2k/- notes were also found and exchanged in Gujarat in just a few days (The PM and HM s state.). I am sure you are looking into the mental gymnastics that the RW indulge in :(. What is sadder that most of the people who try to defend can t make sense one way or the other and start to name-call and get personal as they have nothing else

Disability questions dropped in NHFS-6 Just came to know today that in the upcoming National Family Health Survey-6 disability questions are being dropped. Why is this important. To put it simply, if you don t have numbers, you won t and can t make policies for them. India is one of the worst countries to live if you are disabled. The easiest way to share to draw attention is most Railway platforms are not at level with people. Just as Mick Lynch shares in the UK, the same is pretty much true for India too. Meanwhile in Europe, they do make an effort to be level so even disabled people have some dignity. If your public transport is sorted, then people would want much more and you will be obligated to provide for them as they are citizens. Here, we have had many reports of women being sexually molested when being transferred from platform to coach irrespective of their age or whatnot The main takeaway is if you do not have their voice, you won t make policies for them. They won t go away but you will make life hell for them. One thing to keep in mind that most people assume that most people are disabled from birth. This may or may not be true. For e.g. in the above triple Railways accidents, there are bound to be disabled people or newly disabled people who were healthy before the accident. The most common accident is road accidents, some involving pedestrians and vehicles or both, the easiest is Ministry of Road Transport data that says 4,00,000 people sustained injuries in 2021 alone in road mishaps. And this is in a country where even accidents are highly under-reported, for more than one reason. The biggest reason especially in 2 and 4 wheeler is the increased premium they would have to pay if in an accident, so they usually compromise with the other and pay off the Traffic Inspector. Sadly, I haven t read a new book, although there are a few books I m looking forward to have. People living in India and neighbors please be careful as more heat waves are expected. Till later.

5 June 2023

Reproducible Builds: Reproducible Builds in May 2023

Welcome to the May 2023 report from the Reproducible Builds project In our reports, we outline the most important things that we have been up to over the past month. As always, if you are interested in contributing to the project, please visit our Contribute page on our website.

Holger Levsen gave a talk at the 2023 edition of the Debian Reunion Hamburg, a semi-informal meetup of Debian-related people in northern Germany. The slides are available online.

In April, Holger Levsen gave a talk at foss-north 2023 titled Reproducible Builds, the first ten years. Last month, however, Holger s talk was covered in a round-up of the conference on the Free Software Foundation Europe (FSFE) blog.

Pronnoy Goswami, Saksham Gupta, Zhiyuan Li, Na Meng and Daphne Yao from Virginia Tech published a paper investigating the Reproducibility of NPM Packages. The abstract includes:

When using open-source NPM packages, most developers download prebuilt packages on npmjs.com instead of building those packages from available source, and implicitly trust the downloaded packages. However, it is unknown whether the blindly trusted prebuilt NPM packages are reproducible (i.e., whether there is always a verifiable path from source code to any published NPM package). [ ] We downloaded versions/releases of 226 most popularly used NPM packages and then built each version with the available source on GitHub. Next, we applied a differencing tool to compare the versions we built against versions downloaded from NPM, and further inspected any reported difference.

The paper reports that among the 3,390 versions of the 226 packages, only 2,087 versions are reproducible, and furthermore that multiple factors contribute to the non-reproducibility including flexible versioning information in package.json file and the divergent behaviors between distinct versions of tools used in the build process. The paper concludes with insights for future verifiable build procedures. Unfortunately, a PDF is not available publically yet, but a Digital Object Identifier (DOI) is available on the paper s IEEE page.

Elsewhere in academia, Betul Gokkaya, Leonardo Aniello and Basel Halak of the School of Electronics and Computer Science at the University of Southampton published a new paper containing a broad overview of attacks and comprehensive risk assessment for software supply chain security. Their paper, titled Software supply chain: review of attacks, risk assessment strategies and security controls, analyses the most common software supply-chain attacks by providing the latest trend of analyzed attack, and identifies the security risks for open-source and third-party software supply chains. Furthermore, their study introduces unique security controls to mitigate analyzed cyber-attacks and risks by linking them with real-life security incidence and attacks . (arXiv.org, PDF)

NixOS is now tracking two new reports at reproducible.nixos.org. Aside from the collection of build-time dependencies of the minimal and Gnome installation ISOs, this page now also contains reports that are restricted to the artifacts that make it into the image. The minimal ISO is currently reproducible except for Python 3.10, which hopefully will be resolved with the coming update to Python version 3.11.

On our rb-general mailing list this month:

David A. Wheeler started a thread noting that the OSSGadget project s oss-reproducible tool was measuring something related to but not the same as reproducible builds. Initially they had adopted the term semantically reproducible build term for what it measured, which they defined as being if its build results can be either recreated exactly (a bit for bit reproducible build), or if the differences between the release package and a rebuilt package are not expected to produce functional differences in normal cases. This generated a significant number of replies, and several were concerned that people might confuse what they were measuring with reproducible builds . After discussion, the OSSGadget developers decided to switch to the term semantically equivalent for what they measured in order to reduce the risk of confusion. Vagrant Cascadian (vagrantc) posted an update about GCC, binutils, and Debian s build-essential set with some progress, some hope, and I daresay, some fears . Lastly, kpcyrd asked a question about building a reproducible Linux kernel package for Arch Linux (answered by Arnout Engelen). In the same, thread David A. Wheeler pointed out that the Linux Kernel documentation has a chapter about Reproducible kernel builds now as well.

In Debian this month, nine reviews of Debian packages were added, 20 were updated and 6 were removed this month, all adding to our knowledge about identified issues. In addition, Vagrant Cascadian added a link to the source code causing various ecbuild issues. [ ]

The F-Droid project updated its Inclusion How-To with a new section explaining why it considers reproducible builds to be best practice and hopes developers will support the team s efforts to make as many (new) apps reproducible as it reasonably can.

In diffoscope development this month, version 242 was uploaded to Debian unstable by Chris Lamb who also made the following changes:

If binwalk is not available, ensure the user knows they may be missing more info. [ ]
Factor out generating a human-readable comment when missing a Python module. [ ]

In addition, Mattia Rizzolo documented how to (re)-produce a binary blob in the code [ ] and Vagrant Cascadian updated the version of diffoscope in GNU Guix to 242 [ ].

reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, Holger Levsen uploaded versions 0.7.24 and 0.7.25 to Debian unstable which added support for Tox versions 3 and 4 with help from Vagrant Cascadian [ ][ ][ ]

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Alper Nebi Yasak:

#1035375 filed against `mtools` (forwarded upstream).

Bernhard M. Wiedemann:

`snobol4` (datetime-related issue)

`python310` (`.pyc` ordering issue)

`xmvn` (randomised issue)

Chris Lamb:

#1036220 filed against `refnx`.

#1036221 filed against `mfem`.

Vagrant Cascadian:

#1035365 filed against `lombok`.

#1035394 filed against `lcov`.

#1035400 filed against `lucene8`.

#1035405 filed against `bnd`.

#1035630 filed against `clc-intercal`.

#1035704 filed against `proj`.

#1036367 filed against `gcc-13`.

#1036521 filed against `pygopherd`.

#1036522 filed against `pytorch-audio`.

#1036571 & #1036572 filed against `vcmi`.

#1036939 filed against `proj`.

In addition, Jason A. Donenfeld filed a bug (now fixed in the latest alpha version) in the Android issue tracker to report that `generateLocaleConfig` in Android Gradle Plugin version 8.1.0 generates XML files using non-deterministic ordering, breaking reproducible builds. [ ]

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In May, a number of changes were made by Holger Levsen:

Update the kernel configuration of `arm64` nodes only put required modules in the `initrd` to save space in the `/boot` partition. [ ]

A huge number of changes to a new tool to document/track Jenkins node maintenance, including adding `--fetch`, `--help`, `--no-future` and `--verbose` options [ ][ ][ ][ ] as well as adding a suite of new actions, such as `apt-upgrade`, `command`, `deploy-git`, `rmstamp`, etc. [ ][ ][ ][ ] in addition a significant amount of refactoring [ ][ ][ ][ ].

Issue warnings if `apt` has updates to install. [ ]

Allow Jenkins to run apt get update in maintenance job. [ ]

Installed `bind9-dnsutils` on some Ubuntu 18.04 nodes. [ ][ ]

Fixed the Jenkins shell monitor to correctly deal with little-used directories. [ ]

Updated the node health check to warn when `apt` upgrades are available. [ ]

Performed some node maintenance. [ ]

In addition, Vagrant Cascadian added the `nocheck`, `nopgo` and `nolto` when building `gcc-` and `binutils` packages [ ] as well as performed some node maintenance [ ][ ]. In addition, Roland Clobus updated the openQA configuration to specify longer timeouts and access to the developer mode [ ] and updated the URL used for reproducible Debian Live images [ ].

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute* page on our website. However, you can get in touch with us via:

IRC: `#reproducible-builds` on `irc.oftc.net`.

Mailing list: `rb-general@lists.reproducible-builds.org`

Twitter: @ReproBuilds

4 June 2023

Thorsten Alteholz: My Debian Activities in May 2023

FTP master This month I accepted 157 and rejected 22 packages. The overall number of packages that got accepted was 160. Debian LTS This was my hundred-seventh month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. This month my all in all workload has been 14h. During that time I uploaded:

[DLA 3430-1] cups-filters security update for one CVE
[DSA 5407-1] cups-filters security update for one CVE
[unstable] upload of cups-filters to fix CVE-2023-24805
[#1036548] unblock bug to fix CVE-2023-24805 in bookworm
[unstable] upload of sniproxy to fix CVE-2023-25076
[DSA 5413-1] sniproxy security update in Bullseye for one CVE
[cups] working to fix CVE-2023-32324 in unstable, Bookworm, Bullseye, Buster

The CVEs for cups-filters and cups have been embargoed ones, so the work for cups was done in May but the uploads happen in June. I also did some work on security-master to inject missing dependencies for hugo and gitlab-workhose. Last but not least I did some days on frontdesk duties. Debian ELTS This month was the fifty eighth ELTS month.

[ELA-852-1] cups-filters security update in Jessie and Stretch for one CVE
[ELA-856-1] freetype security update in Jessie and Stretch for two CVEs
[ELA-857-1] libtasn1-6 security update in Jessie and Stretch for one CVE
[cups] working to fix CVE-2023-32324 in Jessie and Stretch

The CVEs for cups-filters and cups have been embargoed ones, so the work for cups was done in May but the uploads happen in June. Last but not least I did some days on frontdesk duties. Debian Astro This month I uploaded some packages to fix RC bugs, that were
detected by one of many QA tools:

Thanks a lot to all the hardworking people who run these tools! Debian Printing This month I could fix RC bugs in:

lprint

This work is generously funded by Freexian! Debian Mobcom This month I could fix RC bugs in:

osmo-iuh

Other stuff Some other packages also had last minute RC bugs:

I even did an upload of a new package force-ip-protocol. I finally had enough of people using IPv6 for their hosts but are unable to configure it. Now I can force firefox, or whatever software, to only use IPv4. One nuisance settled.

2 June 2023

Matt Brown: Calling time on DNSSEC: The costs exceed the benefits

I m calling time on DNSSEC. Last week, prompted by a change in my DNS hosting setup, I began removing it from the few personal zones I had signed. Then this Monday the .nz ccTLD experienced a multi-day availability incident triggered by the annual DNSSEC key rotation process. This incident broke several of my unsigned zones, which led me to say very unkind things about DNSSEC on Mastodon and now I feel compelled to more completely explain my thinking: For almost all domains and use-cases, the costs and risks of deploying DNSSEC outweigh the benefits it provides. Don t bother signing your zones. The .nz incident, while topical, is not the motivation or the trigger for this conclusion. Had it been a novel incident, it would still have been annoying, but novel incidents are how we learn so I have a small tolerance for them. The problem with DNSSEC is precisely that this incident was not novel, just the latest in a long and growing list. It s a clear pattern. DNSSEC is complex and risky to deploy. Choosing to sign your zone will almost inevitably mean that you will experience lower availability for your domain over time than if you leave it unsigned. Even if you have a team of DNS experts maintaining your zone and DNS infrastructure, the risk of routine operational tasks triggering a loss of availability (unrelated to any attempted attacks that DNSSEC may thwart) is very high - almost guaranteed to occur. Worse, because of the nature of DNS and DNSSEC these incidents will tend to be prolonged and out of your control to remediate in a timely fashion. The only benefit you get in return for accepting this almost certain reduction in availability is trust in the integrity of the DNS data a subset of your users (those who validate DNSSEC) receive. Trusted DNS data that is then used to communicate across an untrusted network layer. An untrusted network layer which you are almost certainly protecting with TLS which provides a more comprehensive and trustworthy set of security guarantees than DNSSEC is capable of, and provides those guarantees to all your users regardless of whether they are validating DNSSEC or not. In summary, in our modern world where TLS is ubiquitous, DNSSEC provides only a thin layer of redundant protection on top of the comprehensive guarantees provided by TLS, but adds significant operational complexity, cost and a high likelihood of lowered availability. In an ideal world, where the deployment cost of DNSSEC and the risk of DNSSEC-induced outages were both low, it would absolutely be desirable to have that redundancy in our layers of protection. In the real world, given the DNSSEC protocol we have today, the choice to avoid its complexity and rely on TLS alone is not at all painful or risky to make as the operator of an online service. In fact, it s the prudent choice that will result in better overall security outcomes for your users. Ignore DNSSEC and invest the time and resources you would have spent deploying it improving your TLS key and certificate management. Ironically, the one use-case where I think a valid counter-argument for this position can be made is TLDs (including ccTLDs such as .nz). Despite its many failings, DNSSEC is an Internet Standard, and as infrastructure providers, TLDs have an obligation to enable its use. Unfortunately this means that everyone has to bear the costs, complexities and availability risks that DNSSEC burdens these operators with. We can t avoid that fact, but we can avoid creating further costs, complexities and risks by choosing not to deploy DNSSEC on the rest of our non-TLD zones.

But DNSSEC will save us from the evil CA ecosystem! Historically, the strongest motivation for DNSSEC has not been the direct security benefits themselves (which as explained above are minimal compared to what TLS provides), but in the new capabilities and use-cases that could be enabled if DNS were able to provide integrity and trusted data to applications. Specifically, the promise of DNS-based Authentication of Named Entities (DANE) is that with DNSSEC we can be free of the X.509 certificate authority ecosystem and along with it the expensive certificate issuance racket and dubious trust properties that have long been its most distinguishing features. Ten years ago this was an extremely compelling proposition with significant potential to improve the Internet. That potential has gone unfulfilled. Instead of maturing as deployments progressed and associated operational experience was gained, DNSSEC has been beset by the discovery of issue after issue. Each of these has necessitated further changes and additions to the protocol, increasing complexity and deployment cost. For many zones, including significant zones like google.com (where I led the attempt to evaluate and deploy DNSSEC in the mid 2010s), it is simply infeasible to deploy the protocol at all, let alone in a reliable and dependable manner. While DNSSEC maturation and deployment has been languishing, the TLS ecosystem has been steadily and impressively improving. Thanks to the efforts of many individuals and companies, although still founded on the use of a set of root certificate authorities, the TLS and CA ecosystem today features transparency, validation and multi-party accountability that comprehensively build trust in the ability to depend and rely upon the security guarantees that TLS provides. When you use TLS today, you benefit from:

Free/cheap issuance from a number of different certificate authorities.

Regular, automated issuance/renewal via the ACME protocol.

Visibility into who has issued certificates for your domain and when through Certificate Transparency logs.

Confidence that certificates issued without certificate transparency (and therefore lacking an SCT) will not be accepted by the leading modern browsers.

The use of modern cryptographic protocols as a baseline, with a plausible and compelling story for how these can be steadily and promptly updated over time.

DNSSEC with DANE can match the TLS ecosystem on the first benefit (up front price) and perhaps makes the second benefit moot, but has no ability to match any of the other transparency and accountability measures that today s TLS ecosystem offers. If your ZSK is stolen, or a parent zone is compromised or coerced, validly signed TLSA records for a forged certificate can be produced and spoofed to users under attack with minimal chances of detection. Finally, in terms of overall trust in the roots of the system, the CA/Browser forum requirements continue to improve the accountability and transparency of TLS certificate authorities, significantly reducing the ability for any single actor (say a nefarious government) to subvert the system. The DNS root has a well established transparent multi-party system for establishing trust in the DNSSEC root itself, but at the TLD level, almost intentionally thanks to the hierarchical nature of DNS, DNSSEC has multiple single points of control (or coercion) which exist outside of any formal system of transparency or accountability. We ve moved from DANE being a potential improvement in security over TLS when it was first proposed, to being a definite regression from what TLS provides today. That s not to say that TLS is perfect, but given where we re at, we ll get a better security return from further investment and improvements in the TLS ecosystem than we will from trying to fix DNSSEC.

But TLS is not ubiquitous for non-HTTP applications The arguments above are most compelling when applied to the web-based HTTP-oriented ecosystem which has driven most of the TLS improvements we ve seen to date. Non-HTTP protocols are lagging in adoption of many of the improvements and best practices TLS has on the web. Some claim this need to provide a solution for non-HTTP, non-web applications provides a motivation to continue pushing DNSSEC deployment. I disagree, I think it provides a motivation to instead double-down on moving those applications to TLS. TLS as the new TCP. The problem is that costs of deploying and operating DNSSEC are largely fixed regardless of how many protocols you are intending to protect with it, and worse, the negative side-effects of DNSSEC deployment can and will easily spill over to affect zones and protocols that don t want or need DNSSEC s protection. To justify continued DNSSEC deployment and operation in this context means using a smaller set of benefits (just for the non-HTTP applications) to justify the already high costs of deploying DNSSEC itself, plus the cost of the risk that DNSSEC poses to the reliability to your websites. I don t see how that equation can ever balance, particularly when you evaluate it against the much lower costs of just turning on TLS for the rest of your non-HTTP protocols instead of deploying DNSSEC. MTA-STS is a worked example of how this can be achieved. If you re still not convinced, consider that even DNS itself is considering moving to TLS (via DoT and DoH) in order to add the confidentiality/privacy attributes the protocol currently lacks. I m not a huge fan of the latency implications of these approaches, but the ongoing discussion shows that clever solutions and mitigations for that may exist. DoT/DoH solve distinct problems from DNSSEC and in principle should be used in combination with it, but in a world where DNS itself is relying on TLS and therefore has eliminated the majority of spoofing and cache poisoning attacks through DoT/DoH deployment the benefit side of the DNSSEC equation gets smaller and smaller still while the costs remain the same.

OK, but better software or more careful operations can reduce DNSSEC s cost Some see the current DNSSEC costs simply as teething problems that will reduce as the software and tooling matures to provide more automation of the risky processes and operational teams learn from their mistakes or opt to simply transfer the risk by outsourcing the management and complexity to larger providers to take care of. I don t find these arguments compelling. We ve already had 15+ years to develop improved software for DNSSEC without success. What s changed that we should expect a better outcome this year or next? Nothing. Even if we did have better software or outsourced operations, the approach is still only hiding the costs behind automation or transferring the risk to another organisation. That may appear to work in the short-term, but eventually when the time comes to upgrade the software, migrate between providers or change registrars the debt will come due and incidents will occur. The problem is the complexity of the protocol itself. No amount of software improvement or outsourcing addresses that. After 15+ years of trying, I think it s worth considering that combining cryptography, caching and distributed consensus, some of the most fundamental and complex computer science problems, into a slow-moving and hard to evolve low-level infrastructure protocol while appropriately balancing security, performance and reliability appears to be beyond our collective ability. That doesn t have to be the end of the world, the improvements achieved in the TLS ecosystem over the same time frame provide a positive counter example - perhaps DNSSEC is simply focusing our attention at the wrong layer of the stack. Ideally secure DNS data would be something we could have, but if the complexity of DNSSEC is the price we have to pay to achieve it, I m out. I would rather opt to remain with the simpler yet insecure DNS protocol and compensate for its short comings at higher transport or application layers where experience shows we are able to more rapidly improve and develop our security capabilities.

Summing up For the vast majority of domains and use-cases there is simply no net benefit to deploying DNSSEC in 2023. I d even go so far as to say that if you ve already signed your zones, you should (carefully) move them back to being unsigned - you ll reduce the complexity of your operating environment and lower your risk of availability loss triggered by DNS. Your users will thank you. The threats that DNSSEC defends against are already amply defended by the now mature and still improving TLS ecosystem at the application layer, and investing in further improvements here carries far more return than deployment of DNSSEC. For TLDs, like .nz whose outage triggered this post, DNSSEC is not going anywhere and investment in mitigating its complexities and risks is an unfortunate burden that must be shouldered. While the full incident report of what went wrong with .nz is not yet available, the interim report already hints at some useful insights. It is important that InternetNZ publishes a full and comprehensive review so that the full set of learnings and improvements this incident can provide can be fully realised by .nz and other TLD operators stuck with the unenviable task of trying to safely operate DNSSEC.

Postscript After taking a few days to draft and edit this post, I ve just stumbled across a presentation from the well respected Geoff Huston at last weeks RIPE86 meeting. I ve only had time to skim the slides (video here) - they don t seem to disagree with my thinking regarding the futility of the current state of DNSSEC, but also contain some interesting ideas for what it might take for DNSSEC to become a compelling proposition. Probably worth a read/watch!

Next.

Previous.